Bio-Inspired Neural Network Dynamics-Aware Reinforcement Learning for Spiking Neural Network
Abstract
1. Introduction
2. Related Works
3. Method
3.1. Overview
- Consider a SNN as a collection of neurons. Therefore, the depiction of a SNN is the result of the collective behavior of numerous agents. As the weight of each synapse is the focal point of training, each synapse is regarded as an agent.
- Regard the SNN as an agent. Implement reinforcement learning techniques on an SNN that has been initially trained using methods similar to backpropagation.
3.2. Strategy A: Regard Each Synapse as an Agent
3.2.1. Action
3.2.2. State
- Historical actions and the corresponding rewards: Considering the historical actions along with their corresponding rewards implies that synapses will choose future actions based on the past actions’ effectiveness, potentially introducing past errors into future decision-making. Aman Bhargava’s research incorporates two prior actions and their associated rewards to formulate a learning method, resulting in a learning strategy [12] that is akin to the gradient descent approach. Consequently, our work also factors historical actions and rewards into the state space design process.
- Current synaptic weight: The present synaptic weights hold crucial information to determine the next action in updating synaptic weights. From both computational science and biological perspectives, any learning strategy that neglects the current synaptic weights is very incomplete.
- The topology information of the synapse in the SNN: This is the innovation of our work. To illustrate the logic behind this aspect, let us examine the process of cellular development in biology. Consider, for instance, human embryonic cells. At the outset, the majority of these cells were entirely uniform, yet varying stimuli in different areas led to cellular differentiation. Research in biology suggests that distinct regions of the human brain serve different functions. Excluding location factors fails to account for why brain neurons with identical structures develop varied functions and structures depending on their location.
- Model 1: Record two previous actions a and corresponding rewards r, to construct the state space S as:
- Model 2: Record two previous actions a and corresponding rewards r, current synaptic weight to construct the state space S as:
- Model 3: Record two previous actions a and corresponding rewards r, current synapse topology location to construct the state space S as:
- Model 4: Record two previous actions a and corresponding rewards r, current synaptic weight , current synapse topology location to construct the state space S as:
- Model 5: Record current synaptic weight , current synapse topology location to construct the state space S as:
3.2.3. Reward
3.2.4. Policy
3.3. Strategy B: Regard SNN as an Agent
4. Experiment Results and Discussion
4.1. Experimental Framework
4.2. Decide Learning Rate and Simulation Time T
4.3. Result for Strategy A
- (1)
- Model 1 demonstrates relatively superior search performance when the hidden layer contains 8 neurons. The inverted pendulum achieves balance stability for approximately 60 frames;
- (2)
- Model 2 determined the best strategy when the hidden layer was configured with 32 neurons. The inverted pendulum remained balanced for about 48 frames.
- (3)
- Model 3 demonstrates optimal search performance when the hidden layer contains 32 neurons. The results show minimal differences when the hidden layer has 4, 8, or 16 neurons;
- (4)
- Model 4 shows improved performance with a hidden layer consisting of either 8 or 16 neurons. Nevertheless, increasing the hidden layer’s neuron count to 32 results in a reduction in the model’s search performance.
- (5)
- Model 5 shows enhanced performance when equipped with either 4 or 8 neurons in the hidden layer. Specifically, with 8 neurons, the inverted pendulum manages to maintain balance for about 62 frames. Yet, adding more neurons to the hidden layer reduces its effectiveness.
- (1)
- With the number of neurons in the hidden layer set to 4, both Model 1 and Model 3 initially reached a peak before dropping to lower levels, suggesting a transition from a heuristic search to a more thorough search method. Conversely, Model 2 and Model 4 were ineffective in steering the weight updates within the spiking neural network. On the other hand, Model 5, being the most simplified by excluding the second-order Markov states, exhibited the best search performance. Remarkably, it showed a pronounced heuristic effect in subsequent searches after peaking.
- (2)
- With the number of neurons in the hidden layer set to 8, the five models all show a certain degree of search effect;
- (3)
- With the number of neurons in the hidden layer set to 16, none of the five models showed the optimal search effect;
- (4)
- With the number of neurons in the hidden layer set to 32, Model 2 and Model 3 show the best results.
4.4. Result for Strategy B
4.5. Discussion
4.5.1. For Strategy A
- The model employs a single-step update strategy, meaning that an excessive number of hidden layer neurons, like 16 or 32, can cause reinforcement learning to process a lot of irrelevant information during the early stages of training. This can disrupt the DQN’s ability to memorize and choose high-reward actions. The SNN functions as a cohesive unit. When using the -greedy strategy to navigate the state space, optimizing some parts of the network might simultaneously cause degradation in others. Consequently, increasing the hidden layer neurons might lower the chances of selecting actions that enhance the overall neural network, thereby decreasing the efficiency of the reinforcement learning training algorithm.
- Model 1’s state space design exclusively accounts for past Actions and their associated Rewards, without factoring in synaptic weights or spatial information. It depends entirely on second-order Markov equations for reinforcement learning, which might lead to substantial information loss in extensive networks. If this observation is valid, it implies that in large-scale networks where various segments might be tasked with different functions, differentiation is necessary. In such scenarios, topological information becomes an essential factor during the learning process.
- It is possible that the single-step update strategy is not efficient. For future work, we intend to employ the Monte Carlo update method rather than the single-step update method.
4.5.2. For Strategy B
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gao, H. Dynamic neural networks: Advantages and challenges. Natl. Sci. Rev. 2024, 8, nwae088. [Google Scholar] [CrossRef]
- Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7436–7456. [Google Scholar] [CrossRef]
- Salhab, W.; Ameyed, D.; Jaafar, F.; Mcheick, H. A Systematic Literature Review on AI Safety: Identifying Trends, Challenges, and Future Directions. IEEE Access 2024, 12, 131762–131784. [Google Scholar] [CrossRef]
- Xu, Q.; Xie, W.; Liao, B.; Hu, C.; Qin, L.; Yang, Z.; Xiong, H.; Lyu, Y.; Zhou, Y.; Luo, A. Interpretability of Clinical Decision Support Systems Based on Artificial Intelligence from Technological and Medical Perspective: A Systematic Review. J. Healthc. Eng. 2023, 2023, 9919269. [Google Scholar] [CrossRef]
- Roy, K.; Jaiswal, A.; Panda, P. Towards spike-based machine intelligence with neuromorphic computing. Nature 2019, 575, 607–617. [Google Scholar] [CrossRef] [PubMed]
- Bohte, S.M.; Kok, J.N.; La Poutre, H. Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 2002, 48, 17–37. [Google Scholar] [CrossRef]
- Hopfield, J.J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 1984, 81, 3088–3092. [Google Scholar] [CrossRef] [PubMed]
- Olding, D. The Organization of Behavior: A Neuropsychological Theory; Psychology Press: East Sussex, UK, 1968. [Google Scholar]
- Frémaux, N.; Gerstner, W. Neuromodulated Spike-Timing-Dependent Plasticity, and Theory of Three-Factor Learning Rules. Front. Neural Circuits 2016, 9, 85. [Google Scholar] [CrossRef]
- Masquelier, T.; Thorpe, S.J. Learning to recognize objects using waves of spikes and Spike Timing-Dependent Plasticity. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010. [Google Scholar] [CrossRef]
- Zhang, R.; Wang, Z.; Zheng, M.; Zhao, Y.; Huang, Z. Emotion-sensitive deep dyna-Q learning for task-completion dialogue policy learning. Neurocomputing 2021, 459, 122–130. [Google Scholar] [CrossRef]
- Bhargava, A.; Rezaei, M.R.; Lankarany, M. Gradient-Free Neural Network Training via Synaptic-Level Reinforcement Learning. AppliedMath 2022, 2, 185–195. [Google Scholar] [CrossRef]
- Akopyan, F.; Sawada, J.; Cassidy, A.; Alvarez-Icaza, R.; Arthur, J.; Merolla, P.; Imam, N.; Nakamura, Y.; Datta, P.; Nam, G.; et al. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2015, 34, 1537–1557. [Google Scholar] [CrossRef]
- Hu, Y.; Tang, H.; Pan, G. Spiking Deep Residual Network. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 5200–5205. [Google Scholar] [CrossRef] [PubMed]
- Sengupta, A.; Ye, Y.; Wang, R.; Liu, C.; Roy, K. Going Deeper in Spiking Neural Networks: VGG and Residual Architectures. Front. Neurosci. 2019, 13, 95. [Google Scholar] [CrossRef]
- Han, B.; Srinivasan, G.; Roy, K. RMP-SNN: Residual Membrane Potential Neuron for Enabling Deeper High-Accuracy and Low-Latency Spiking Neural Network. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Hwang, S.; Kung, J. One-Spike SNN: Single-Spike Phase Coding With Base Manipulation for ANN-to-SNN Conversion Loss Minimization. IEEE Trans. Emerg. Top. Comput. 2025, 13, 162–172. [Google Scholar] [CrossRef]
- Aydin, A.; Gehrig, M.; Gehrig, D.; Scaramuzza, D. A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2023. [Google Scholar] [CrossRef]
- Shariff, W.; Kielty, P.; Lemley, J.; Corcoran, P. Face Detection Using Hybrid SNN-ANN to Process Neuromorphic Event Stream. IEEE Access 2025, 13, 9844–9856. [Google Scholar] [CrossRef]
- Zheng, H.; Wu, Y.; Deng, L.; Hu, Y.; Li, G. Going Deeper With Directly-Trained Larger Spiking Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2020. [Google Scholar] [CrossRef]
- Fang, W.; Yu, Z.; Chen, Y.; Huang, T.; Masquelier, T.; Tian, Y. Deep Residual Learning in Spiking Neural Networks. Adv. Neural Inf. Process. Syst. 2021, 34, 21056–21069. [Google Scholar]
- Yu, Q.; Tang, H.; Tan, K.C.; Li, H. Precise-Spike-Driven Synaptic Plasticity: Learning Hetero-Association of Spatiotemporal Spike Patterns. PLoS ONE 2013, 8, e78318. [Google Scholar] [CrossRef]
- Kasabov, N.; Dhoble, K.; Nuntalid, N.; Indiveri, G. Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition. Neural Netw. 2013, 41, 188–201. [Google Scholar] [CrossRef] [PubMed]
- Taherkhani, A.; Belatreche, A.; Li, Y.; Maguire, L. A new biologically plausible supervised learning method for spiking neurons. In Proceedings of the 22st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 23–25 April 2024. [Google Scholar]
- Guo, L.; Wang, Z.; Adjouadi, M. A novel biologically plausible supervised learning method for spiking neurons. In Proceedings of the 2015 World Congress in Computer Science, Computer Engineering, & Applied Computing, International Conference Artificial Intelligence (ICAI’15), San Francisco, CA, USA, 21–23 October 2015. [Google Scholar]
- Liu, D.; Yue, S. Visual pattern recognition using unsupervised spike timing dependent plasticity learning. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Ancouver, BC, Canada, 24–29 July 2016. [Google Scholar] [CrossRef]
- Iyer, L.R.; Basu, A. Unsupervised learning of event-based image recordings using spike-timing-dependent plasticity. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017. [Google Scholar] [CrossRef]
- Joseph, P.; Marthe, G.; Goursaud, C. STDP Training Design and Performances of a SNN for Sequence Detection as a Wake Up Radio in a IoT Network. In Proceedings of the 2025 IEEE Wireless Communications and Networking Conference (WCNC), Milan, Italy, 24–27 March 2025. [Google Scholar] [CrossRef]
- Saranirad, V.; Dora, S.; McGinnity, T.M.; Coyle, D. CDNA-SNN: A New Spiking Neural Network for Pattern Classification Using Neuronal Assemblies. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 2274–2287. [Google Scholar] [CrossRef] [PubMed]
- Du, S.; Zhu, H.; Zhang, Y.; Hong, Q. An Encoder–Decoder Model Based on Spiking Neural Networks for Address Event Representation Object Recognition. IEEE Trans. Cogn. Dev. Syst. 2025, 17, 1286–1300. [Google Scholar] [CrossRef]
- Zheng, N.; Mazumder, P. Hardware-Friendly Actor-Critic Reinforcement Learning Through Modulation of Spike-Timing-Dependent Plasticity. IEEE Trans. Comput. 2017, 66, 299–311. [Google Scholar] [CrossRef]
- Lee, K.; Kwon, D.S. Synaptic plasticity model of a spiking neural network for reinforcement learning. Neurocomputing 2008, 71, 3037–3043. [Google Scholar] [CrossRef]
- Mahadevuni, A.; Li, P. Navigating mobile robots to target in near shortest time using reinforcement learning with spiking neural networks. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017. [Google Scholar] [CrossRef]
- Kiselev, M. A Spiking Neural Network Structure Implementing Reinforcement Learning. arXiv 2022, arXiv:2204.04431. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Watkins, C.; Christopher, J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nat. News 2015, 518, 13. [Google Scholar] [CrossRef]
- Peters, J.; Bagnell, J.A.; Sammut, C. Policy gradient methods. In Encyclopedia of Machine Learning; Springer: Berlin/Heidelberg, Germany, 2010; Volume 11. [Google Scholar] [CrossRef]
- Nahmias, M.A.; Shastri, B.J.; Tait, A.N.; Prucnal, P.R. A Leaky Integrate-and-Fire Laser Neuron for Ultrafast Cognitive Computing. IEEE J. Sel. Top. Quantum Electron. 2013, 19, 1800212. [Google Scholar] [CrossRef]









| Parameters | Description | Unit | Set to |
|---|---|---|---|
| Membrane time constant | ms | 2.0 | |
| Spike threshold | mV | 1.0 | |
| Reset potential of the membrane | mV | 0.0 |
| No. of Hidden Layer Neurons | Model 1 | Model 2 | Model 3 | Model 4 | Model 5 |
|---|---|---|---|---|---|
| 4 | 22 | 34 | 35 | 26 | 57 |
| 8 | 60 | 30 | 30 | 44 | 62 |
| 16 | 32 | 42 | 32 | 42 | 32 |
| 32 | 38 | 48 | 59 | 32 | 41 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, Y.; Xue, J.; Yang, J.; Zhang, Y. Bio-Inspired Neural Network Dynamics-Aware Reinforcement Learning for Spiking Neural Network. Biomimetics 2026, 11, 47. https://doi.org/10.3390/biomimetics11010047
Zheng Y, Xue J, Yang J, Zhang Y. Bio-Inspired Neural Network Dynamics-Aware Reinforcement Learning for Spiking Neural Network. Biomimetics. 2026; 11(1):47. https://doi.org/10.3390/biomimetics11010047
Chicago/Turabian StyleZheng, Yu, Jingfeng Xue, Junhan Yang, and Yanjun Zhang. 2026. "Bio-Inspired Neural Network Dynamics-Aware Reinforcement Learning for Spiking Neural Network" Biomimetics 11, no. 1: 47. https://doi.org/10.3390/biomimetics11010047
APA StyleZheng, Y., Xue, J., Yang, J., & Zhang, Y. (2026). Bio-Inspired Neural Network Dynamics-Aware Reinforcement Learning for Spiking Neural Network. Biomimetics, 11(1), 47. https://doi.org/10.3390/biomimetics11010047

