Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications

Park, Yuntae; Lee, Jiwoon; Sim, Donggyu; Cho, Youngho; Park, Cheolsoo

doi:10.3390/electronics14030578

Open AccessEditor’s ChoiceArticle

Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications

by

Yuntae Park

^1,†,

Jiwoon Lee

^1,†

,

Donggyu Sim

¹

,

Youngho Cho

^2,*

and

Cheolsoo Park

^1,*

¹

Department of Computer Engineering, Kwangwoon University, Seoul 01897, Republic of Korea

²

Department of Information and Communication Engineering, Soonchunhyang University, Asan 31538, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(3), 578; https://doi.org/10.3390/electronics14030578

Submission received: 31 December 2024 / Revised: 24 January 2025 / Accepted: 28 January 2025 / Published: 31 January 2025

(This article belongs to the Special Issue Deep Reinforcement Learning and Its Latest Applications)

Download

Browse Figures

Versions Notes

Abstract

This study investigates a novel approach to robotic arm control through integrating spiking neural networks with the twin delayed deep deterministic policy gradient reinforcement learning algorithm. Specifically, it presents the first application of spiking neural networks-based twin delayed deep deterministic policy gradient in 3D robotic manipulation, demonstrating its extension from traditional 2D tasks to complex 3D target-reaching scenarios with improved energy efficiency and stability. Additionally, with the inertial measurement unit data the system successfully mimics human arm movements, achieving a success rate of 0.95 among 50 trials and enabling an intuitive and accurate human–robot interaction system. This pioneering attempt highlights the feasibility of combining the biologically inspired spiking neural networks with the reinforcement learning algorithm to address the real-time challenges in high-dimensional robotic environments and advance the field of human–robot interaction systems.

Keywords:

spiking neural networks; reinforcement learning; robotic manipulation; target-reaching; inertial measurement unit

1. Introduction

A robotic system consists of hardware and software components interacting with its environment to perform given tasks autonomously [1]. This system is utilized across various applications and increasingly adopts biologically inspired mechanisms to improve its efficiency and adaptability, aiming to implement human-like capabilities [2]. In particular, robotic arms that mimic human hand gestures and arm movements hold significant potential in various human–robot interface (HRI) fields, including assistive devices, rehabilitation therapy, and human–robot collaboration technologies [3,4]. To mimic human arm movements, camera-based systems are often utilized to generate smooth motion trajectories of human movement recorded through motion capture [5]. However, traditional camera-based systems face limitations in tasks requiring real-time control and flexibility due to environmental constraints such as lighting and obstacles [6]. In contrast, the inertial measurement unit (IMU) captures real-time orientation (roll, pitch, yaw) of a human arm, enabling intuitive and responsive control by providing robust and efficient data necessary for determining the movement speed and direction of robotic arms, regardless of environmental conditions. Furthermore, combining IMU data with EMG (electromyography) enables more precise control [7]. IMU measures the real-time orientation of the human arm to guide the robotic arm’s movement speed and direction, while EMG data are used to analyze hand gestures for controlling the robotic arm’s gripper.

Robotic arms can perform a variety of tasks, such as pick-and-place, assembly, and object alignment [8,9]. Accurate target reaching, which involves positioning the robotic arm’s end-effector precisely at a specific goal point, is essential for the successful execution of these tasks and serves as a critical operation for many applications [10]. Traditionally, inverse kinematics (IK) has been employed to compute joint movements based on the target position [11]. During this process, IMU provides initial conditions and orientation data required for IK calculations by measuring the real-time orientation of the human arm. IK demonstrates the strength in calculating joint movements to achieve a target end position of the robotic arm [12].

However, it has significant limitations in complex robotic systems. As the degrees of freedom (DoF) in robotic arms increase, the state and action spaces expand exponentially, making it difficult to identify the optimal joint configuration for achieving a specific target position [13]. This is a major drawback making it difficult for the traditional control method to respond in real time. Furthermore, the IK-based approach encounters difficulties in adapting to non-linear system dynamics and changes in complex working environments, limiting its ability to handle the dynamic environmental conditions and task scenarios [14,15]. Consequently, such constraints could significantly reduce the flexibility and scalability of the robotic systems.

To address these challenges, reinforcement learning (RL) has recently gained attention as an alternative to robotic arm control. RL could learn optimal control policies through the interaction with the environment and input data without explicit system models [16]. This enables the exploration of high-dimensional state and action spaces in systems with high DoF and provides adaptability to changes in the operating environment. RL overcomes the challenges of IK by leveraging its learning ability through interaction with the environment, eliminating the need for explicit mathematical models or iterative solvers. Traditional IK methods often struggle with issues such as computational inefficiency, sensitivity to singularities, and difficulties in generalizing to dynamic or high-dimensional scenarios [17]. RL, however, directly maps the relationship between an end-effector’s target pose and joint configurations through trial-and-error learning, guided by a reward mechanism. This allows RL to adaptively explore complex state-action spaces, find stable solutions even in singular configurations, and continuously refine its control policies based on the environmental feedback. Moreover, RL provides a framework for real-time, robust decision-making that could generalize across diverse tasks and workspaces, enabling efficient and accurate solutions for both position and orientation requirements in robotic systems [18,19,20,21,22,23]. These attributes make RL a promising approach for solving major challenges in robotic arm control, such as the complex motion planning, environmental adaptation, and real-time control [24]. Indeed, deep reinforcement learning (DRL), which applies deep neural networks (DNNs) to RL structure, has demonstrated remarkable results in high-dimensional problems, particularly in the fields like smooth motion planning and robotic navigation tasks using deep deterministic policy gradient (DDPG) [25,26,27].

However, the DNN-based RL models consume enormous computational resources during the learning and inference process due to the complex structure of DNNs [28]. This high computational cost is a major limitation for robots to use limited onboard resources, such as mobile robots powered by batteries or small embedded systems. High energy consumption shortens battery life, limiting the operating time. Additionally, the computational complexity of DRL hinders immediate responses to rapidly changing environments, reducing the reliability and stability of robotic arms, particularly in high-speed or unpredictable scenarios [29,30]. To overcome these limitations, spiking neural networks (SNNs), a biologically inspired third-generation neural network, have emerged as a promising alternative [31]. SNNs were designed based on the biological basis that humans perform information processing in the form of spikes, and could provide low energy consumption and fast calculation speed when combined with a neuromorphic hardware [32]. These properties could make SNNs a promising solution for energy-efficient AI applications in robotics, low-power IoT devices, and real-time control systems [33,34,35].

In particular, research has increasingly focused on the integration of SNNs with RL to improve energy efficiency, including the studies combining DDPG with SNNs for the robotic arm control, known as spiking DDPG (SDDPG) [36,37,38]. SDDPG has demonstrated its energy efficiency and performance of the robotic arm tasks such as reaching target positions, grasping objects, and transporting them by utilizing a camera-based system combining the structures of SNNs and DRL. However, the main drawbacks of DDPG include Q-value overestimation, reduced learning stability, unstable policy updates, and low exploration efficiency in continuous control environments. To address these shortcomings, twin delayed deep deterministic policy gradient (TD3) was proposed in DRL [39]. TD3 mitigates Q-value overestimation by employing two critic networks and improves the learning stability and performance with delaying policy updates. SNN architecture is also applied to TD3, and recent studies have validated its performance in OpenAI gym tasks [30,40].

To our best knowledge, this study represents the first attempt to mimic human arm movements using SNN-based RL, incorporating IMU data, and achieves the first implementation of 3D target-reaching tasks using SNN-based TD3 [41].

The contributions of this study are summarized as follows:

The first attempt to mimic human arm movements using SNN-based RL.
The first implementation of 3D target-reaching tasks using SNN-based TD3.

2. Related Work

In the study of robotic arms mimicking human arm movements, research has actively explored both IK methods and learning-based approaches utilizing neural networks, aiming to achieve natural and efficient control [3]. IK is a traditional method to control the robotic arms, enabling the computation of joint movements required for the arm to reach a target position [11]. Particularly, as the DoF of a robotic arm increase—allowing for more versatile movements—appropriate motion modelling becomes crucial [5], which helps the robot to be controlled efficiently while maintaining its natural posture. From this perspective, researchers have applied IK to optimize the criteria such as minimum jerk, energy efficiency, and collision avoidance, thereby generating smooth trajectories that resemble human movements [42,43,44,45]. However, IK-based approaches face several limitations in complex dynamic environments, leading to growing interest in learning-based methods as potential solutions [14,46]. Among these, approaches utilizing DNNs and DRL have gained significant attention for overcoming such challenges [15,24,47]. These methods often utilize human motion data obtained from motion capture systems or inertial measurement units (IMUs) to train models that could generate human-like movements [48]. In particular, reinforcement learning-based control has shown strengths in optimizing trajectories and learning human movement patterns through trial and error in simulation or real environments [49]. Such approaches enable more natural and efficient control in various applications, including humanoid robots, rehabilitation, and human–robot interaction (HRI). Consequently, their recent advancements have allowed robotic arms to autonomously learn and discover optimal motions in diverse environments.

With the development of these learning-based approaches, SNNs have recently received attention to solve the target-reaching problem of robotic arms [50,51,52,53]. While traditional IK-based methods have high computational complexity and modeling limitations, SNNs have the advantage of low energy consumption and excellent temporal information processing ability by imitating the operating principles of biological neural networks [46]. Therefore, SNNs could overcome the problems of the IK-based methods through a learning-based approach [54]. For example, SNNs dynamically learn the complex relationships between robotic joint movement and target position using biologically plausible learning rules like spike-timing-dependent plasticity (STDP) [50]. Zahra et al. [55] proposed a method inspired by the motor babbling process observed in human infants during early motor learning stages, focusing on simple target-reaching tasks in simulation without utilizing camera-based or IMU data. This approach generates the joint information in the stages of learning, enabling immediate control commands to be generated without pre-planning paths to target points [55]. Additionally, Polykretis et al. developed a model to mimic natural human movement trajectories by generating bell-shaped velocity profiles, allowing for smoother and more efficient motions, aiming to achieve target-reaching tasks within a simulation environment [56]. These approaches enable robots to operate effectively in environments involving human interaction or dynamic workspaces, generating movement trajectories that are more natural than those produced by traditional methods. Therefore, it enables robots to autonomously adapt and achieve goals in various environments and enables efficient and natural control beyond the traditional computational models based on mathematical methods.

The aforementioned DRL has been widely utilized as a robust learning-based approach for solving the target-reaching problem in robotic arms. However, it also faces limitations such as high computational costs and a lack of energy efficiency like the IK-based approach. To address these challenges, a hybrid approach combining SNNs and DRL, known as spiking DDPG (SDDPG), has been proposed [36,37,38]. By taking advantage of the energy efficiency, fast inference capability, and real-time learning potential of SNNs, which mimic the operation of the biological neurons, its combination with DRL effectively reduces computational costs and improves energy efficiency [57,58]. As a result, SDDPG enables autonomous and efficient learning in complex workspaces [36]. SDDPG adopts an actor–critic structure, where the actor network is implemented with SNNs and the critic network uses DNNs. The spiking actor network (SAN) is responsible for generating actions as a policy, while the deep critic network (DCN) evaluates the actions of the SAN using value functions or Q-values [37,38]. To address the non-differentiable nature of SNNs [59], SAN is trained using the surrogate gradient descent learning algorithm, achieving fast learning and high accuracy. During the testing stage, DCN is utilized to evaluate the actor network but is not employed during the testing stage. Consequently, after training, SAN could operate independently on neuromorphic hardware. This enables SAN to perform real-time control with limited onboard resources, effectively solving high-DoF robotic arms detecting objects using a camera-based system and reaching target points in diverse simulation environments. It has demonstrated superior reaching accuracy and trajectory optimization compared to the conventional DDPG models [38].

Although SDDPG successfully combines the advantages of DRL and SNNs to enable energy-efficient and stable robotic control in complex environments, it inherits limitations such as Q-value overestimation and policy instability from DDPG, which could impact the learning stability of SNN-based systems [60]. To overcome these issues, TD3 has been proposed, and when combined with SNNs, it shows potential to simultaneously stabilize Q-values and improve exploration efficiency [39].

This study represents the first attempt to mimic human arm movements through SNN-based reinforcement learning, incorporating IMU data. Furthermore, we implement 3D target-reaching tasks using SNN-based TD3 for the first time, demonstrating that the fusion of DRL and SNNs enables energy-efficient and stable control in complex environments. Following up on the achievements of previous research, this approach emphasizes the effectiveness of combining SNNs and DRL for high-dimensional problems such as robotic arm control.

3. Materials and Methods

We performed the target-reaching task using a 3-Dof robotic arm that independently implemented using the OpenAI gymnasium [61,62], SpyTorch [63] developed as an extension of the widely used PyTorch deep learning library framework and the Mujoco physics simulator [64]. As shown in Figure 1, Mujoco’s Reacher-v4 environment was customized by extending the joint structure, modifying the target position configuration, and adjusting the reward structure. The newly designed environment expands the task from 2D space to 3D, enabling reinforcement learning agents to perform the target-reaching task in more complex environments.

The robotic arm in the complex 3D space was successfully controlled by adopting the existing method that applies SAN in the actor–critic framework of the TD3 algorithm based on DRL, and the surrogate gradient approach was utilized to address the continuous control problems [41]. This approach is illustrated in Figure 2. The two-neuron encoder for the input environmental state values, introduced by Pérez-Carrasco et al. [65], was employed to encode the input values into binary neurons [41]. This method converts the negative input values into spike signals, expanding the n dimension of states to 2n. Each state value is mapped to two neurons: one activates when the state value is positive, while the other activates when it is negative. Consequently, for any given state value, the neuron representing the other state remains inactive, ensuring the independent firing between the two neurons. Therefore, this method has the advantage of providing a stable learning process for the SNNs without directly converting the input value to spikes, and its significant performance has been proven even in a complex reinforcement learning environment combining DRL and SNNs [41,65,66]. The encoded spikes are passed to the hidden layers of the SNNs, which consist of integrate-and-fire (IF) neurons [67]. These neurons process their input currents over time, updating the membrane potentials. When the membrane potential surpasses a threshold due to their accumulated currents, a spike signal is emitted to stimulate subsequent neurons. A membrane potential decoder [41] interprets the cumulative membrane potential of the output layer neurons over time, and the continuous robot arm control movements are generated based on this. To enhance the stability in continuous robot arm control, we adopted the approach proposed by Akl et al. [41], disabling the spiking mechanism in the output neurons by setting their firing threshold to infinity. This approach allows the membrane potentials to be accumulated without spike firing, and the final values, calculated as the weighted sum of incoming spikes with no decay, are directly utilized as continuous actions. Additionally, the decoded membrane potentials are scaled to fit within the joint torque limits of the robotic arm, ensuring compatibility with the physical system constraints.

3.1. Robotic Arm Design

The robotic arm consists of three hinge joints, each performing rotational movements:

Joint 1: Rotation in the Z-axis (vertical axis).
Joint 2, 3: Rotation in the Y-axis (horizontal axis).

A robotic manipulator consisting of three links and a fingertip that serves as the end-effector. The total arm length is approximately 0.35 m, including:

A vertical base link of 0.05 m.
Two horizontal links of 0.1 m in length.
A fingertip with a diameter of 0.11 m.

All joints are controlled with torques in the range between −1.0 and 1.0. The fingertip is equipped with a contact detection sensor designed to identify its interaction with target objects.

3.2. Simulation Environment

In this study, to overcome the limitation where the existing 2-DoF robot arm could only move in a plane, we designed it to operate in 3D space by expanding it to a 3-DoF robot arm. The target position is randomly generated using a spherical coordinate system in the workspace of the robot arm, and is set to be placed at a certain distance from the floor. In addition, the target location was limited to be created within the maximum reach of the robot arm to increase learning efficiency. The reward structure was designed corresponding to the distance from the goal, the feedback of collision, and the proximity to the goal, and the goal reaching, where the reward was applied with a weight considering the robot arm’s collision. Additionally, a collision handling logic was implemented to detect the collision between the ground and the robot’s links, as well as between the links themselves. Penalties were applied based on these collisions, improving the stability of the robot’s learning process. Figure 3 shows the details of the simulation environment. This customization contributed to improving the performance of the robotic arm and enabling learning in more complex 3D work environments. The simulation environment was customized based on Mujoco’s 2-DoF Reacher-v4 environment with the following modifications:

DoF Expansion: The arm was redesigned for 3D movements adding an additional degree of freedom.
Target Positioning: The target points are randomly generated within the workspace using spherical coordinates and are positioned to remain within the robot arm’s maximum reach.
Reward System: Distance-based rewards are refined to vary based on the distance to the target, with additional weighted adjustments applied upon reaching the target depending on collision status and proximity.
Collision Logic: Collisions between the ground and the robot arm’s links, as well as between the links themselves, are detected and penalized accordingly.

The simulation environment was set to gravity

- 9.81 {m / s}^{2}

and a time interval of

0.01

s, and the state values were as follows:

Cosine and sine of joint angles.
Joint angular velocities.
Target’s x, y, and z coordinates.
Distance vector between the fingertip and the target.
Flag indicating collision with the ground.
Flag indicating collision between links.
Flag indicating whether the goal has been reached.

A three-dimensional decision vector is utilized to control the robotic arm, enabling adjustments to the angles of its three joints,

θ_{i}

. The three-dimensional decision vector is defined as follows:

Action = [α_{0}, α_{1}, α_{2}]

where

θ_{i}

represents the angle of the i-th joint, where

i \in {0, 1, 2}

. By modifying these three joint angles, the end effector of the robotic arm can be positioned at the desired location.

3.3. Reward Function Design for Target Reaching

In reinforcement learning, the structure of the reward function is a key factor in determining policy performance [68]. The existing reacher environment’s reward function focuses only on distance and control penalties, neglecting factors like collision handling and stability after reaching the goal. Inspired by the integration of the collision penalties in reciprocal velocity obstacle-based rewards [69] and stage-based guidance in dense reward functions for robotic trajectory planning [70], we expanded the reward function of the existing reacher environment and designed a more sophisticated reward function to improve agent learning and performance. Unlike the original reward structure, which primarily focused on distance and control penalties, our design includes collision penalties and stage-based rewards determined by how close the fingertip is to the target.

The reward function used in the existing reacher environment consists of two components: distance rewards and control rewards. The distance reward is calculated as the negative value of the distance between the robot’s fingertip and the target, offering higher rewards as the fingertip approaches the target. The control reward penalizes the magnitude of movements to prevent excessive motion. However, the existing reward function has the following limitations:

Ignoring collisions: The existing reward function does not account for collisions between the robot and other objects in the environment (e.g., the ground or its own links), which might undermine the robot’s stability in realistic scenarios.
Lack of clarity in goal achievement: The reward function lacks clear criteria for determining whether the goal has been successfully reached, leading to potential instability in learning even after the goal is achieved.

To address these limitations and better reflect the real-world robot control scenarios, we designed a new reward function by incorporating the collision penalties and target-reaching rewards. The structure of the new reward function is detailed below:

Collision Reward: To detect and address collisions, the system identifies collision states within the environment and applies penalties or rewards accordingly. Penalties are imposed when the robot collides with the ground or when self-collisions occur between links, thereby reinforcing the stability of the robot’s movements.
Target-reaching Reward: Rewards are assigned proportionally based on the distance to the target, with higher rewards provided for closer proximity. After reaching the target, additional rewards or penalties are applied depending on the presence or absence of collisions, ensuring the robot maintains a stable state after achieving the goal.

4. Experimental Results

In this section, we present the results of the experiments evaluating the performance of the robot arm’s target-reaching task using the SNN-based TD3 model. The experiment analyzed the variations in the averaged reward received by the agent during a total of 40,000 learning episodes, and evaluated the success rate of reaching the target point provided through the IMU in the simulation. The model was trained based on randomly generated target points in the simulation environment. Figure 4 illustrates examples of the agent’s success and failure trials for the target-reaching tasks. In this experiment, the changes in rewards during the training process and the success rate of the target-reaching tasks are mainly investigated.

4.1. Experimental Settings

In the previous session, we described the Mujoco 3-DoF robotic arm environment designed for target-reaching tasks. The input state values consist of 18 elements: 6 values representing the cosine and sine of the 3 joints, 3 joint angular velocities, 3 target coordinates, a distance vector between the fingertip and the target coordinates, 2 flags indicating collisions with the ground and between links, and 1 flag indicating whether the target has been reached. Through a two-neuron encoder, the model’s input values are expanded to 36. However, since the DCN evaluates the value of specific states and actions, the three action values corresponding to the joints are added, resulting in a total of 39 inputs. Our networks consist of two hidden layers, where 256 IF neurons are implemented. The SAN is a policy network determining the agent’s actions and has three outputs, while the DCN learns the Q-values and has one output. The neuron’s threshold for SAN is set to 1. Other parameters (e.g., memory buffer size, target update interval, discount factor, noise, etc.) are identical to those used by Akl et al. [41].

4.2. Training and Testing Results for the Target-Reaching Task

Figure 5 shows the results of the SNN-based TD3 model trained in a customized environment, with each episode set to a maximum of 500 steps. In the early episodes, the averaged rewards exhibited low values and significant fluctuation. However, as the training process progressed, the averaged reward gradually increased. In the later episodes of the training, the rewards were stabilized within a range between 2500 and 3000, indicating the model’s high success rate in performing the target-reaching tasks. However, a sharp decrease in reward was observed around 25,000th and 30,000th episodes, which could be interpreted as a case where the model is not sufficiently generalized to a specific situation. This phenomenon aligns with the well-known issue of catastrophic forgetting in reinforcement learning [71], where the model might forget the previously learned policies while adapting to new data, such as the random target positions. Despite these challenges, the agent’s performance improved consistently, with the rewards gradually rising and stabilizing after 37,000 episodes. The trained model demonstrated excellent policy optimization, accurately handling various target positions in the simulation environment and achieving outstanding performance in the target-reaching task.

To test the arm movement target-reaching task, IMU data were recorded using the Myo armband developed by Thalmic Labs [72]. The user’s arm movements and posture were tracked in real time through the gyroscope and accelerometer obtained using the IMU sensor in the armband, and converted into the target points in the Mujoco’s Reacher-v4 environment. Although the gyroscope produces accurate posture information, there is a problem in that errors are accumulated and diverge over time [73]. On the other hand, the posture obtained with an accelerometer has the advantage that the error does not increase over time and is limited to a certain range [74]. Since the data from these two sensors are complementary, they were integrated. Additionally, the Kalman filter was applied to reduce noise and improve the accuracy of the estimated posture [75]. This minimized the error and uncertainty of the IMU data, enabling the user’s arm movements to be accurately mapped to the target points in the virtual environment.

As shown in Figure 6, the robotic arm, controlled through the trained SAN agent, followed the target points and effectively mimicked the arm movement target-reaching task. In the experiment, the target-reaching tasks were defined when the distance between the target point and the fingertip of the robot arm was less than 5 mm, and the success rate was calculated based on this. As a result, the target tasks were successfully reached with a success rate of 0.95 among 50 trials, as shown in Table 1, which also summarizes the execution time and performance metrics for both training and testing. SAN showed high adaptability to the target point in the robot arm control and enabled precise control that followed the user’s movements in real time. This approach utilizes the advantages of reinforcement learning to provide high accuracy and stability even in complex environments, and enables an intuitive and efficient interface between users and robots.

5. Discussion

The implementation of the SNN-based TD3 for the 3D target-reaching tasks demonstrates significant advancements in robotic arm control. The experimental results show stable performance in controlling the 3-DoF robotic arm through the extension of Mujoco’s Reacher-v4 environment to handle the 3D movements and incorporating an SNN-based TD3 algorithm. The spiking actor network learned accurate and stable control policies over 40,000 training episodes, achieving a 95% success rate in target-reaching trials with a precision threshold less than 5 mm.

The integration of the IMU data for the arm movement target-reaching task fundamental limitations of camera-based systems. The IMU-based approach provides robust performance regardless of environmental conditions such as lighting or obstacles, enabling real-time control capabilities. Additionally, the successful implementation of TD3 with SNNs for the 3D target-reaching effectively solves the limitations of the previous SDDPG implementations, particularly in handling the Q-value overestimation problem.

The SNN-based approach demonstrates promising potential for future robotic applications based on the simulation results. Previous studies have shown that SNNs, when implemented on neuromorphic hardware, could achieve significant power consumption benefits compared to the traditional neural networks [32,33]. While our simulation demonstrates the effectiveness of combining SNNs with TD3 for the robotic control, the actual energy efficiency benefits would need to be validated through hardware implementation. The integration of the IMU data enables direct human–robot interfaces, suggesting potential applications in the prosthesis technologies and industrial automation once validated on physical systems.

Despite these outstanding performances, several limitations require attention. Training and evaluation of the proposed SNN algorithm were performed in the simulation environment, necessitating the validation under real-world conditions that include unmodeled friction, mechanical tolerances, and sensor noise. The reward function primarily incentivized the target-reaching and collision avoidance without addressing more complex tasks such as path planning around obstacles. The reliance on the IMU signals for the target position inference, while allowing the real-time interaction, might reduce the accuracy during abrupt arm motions or sensor drift without more advanced filtering or calibration strategies. Performances of the computational and energy efficiency require validation on actual neuromorphic hardware to confirm the hardware-level advantages of the SNN-based methods.

The application of the spiking actor networks within the TD3 framework effectively solves the higher-dimensional robotic arm tasks, expanding the possibilities of real-time and low-power motion control. The incorporation of the IMU data as a direct means of teleoperation advances the human–robot interaction capabilities. These findings suggest that an SNN-based TD3 control scheme could serve as the foundation for future applications requiring human-like fluidity, robustness, and low-latency control. Further investigations into domain adaptation, transfer learning from simulation to hardware, and more complex multi-step tasks would extend the potential of this method. The deployment of SNN-based reinforcement learning on the neuromorphic devices for real-world robotic arms would enable the energy-efficient, adaptive, and safer robots for human assistance in daily activities and specialized tasks such as rehabilitation and telepresence.

6. Conclusions

This study demonstrated the successful implementation of the SNN-based TD3 for the 3D target-reaching tasks mimicking human arm movements. The integration of SNNs with TD3 reinforcement learning achieved stable performance in controlling a 3-DoF robotic arm, with a 95% success rate in target-reaching trials. The system effectively combined the neuromorphic computing principles with an advanced reinforcement learning algorithm to achieve precise and efficient robot control. The integration of the IMU data provided robust performance regardless of the environmental conditions, overcoming the limitations of the conventional camera-based systems.

Author Contributions

Methodology, Y.P.; Formal analysis, J.L.; Investigation, Y.P. and J.L.; Writing—original draft, Y.P. and J.L.; Writing—review & editing, D.S., Y.C. and C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2024-RS-2022-00156225) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation), This work was supported by the Technology Innovation Program (RS-2024-00154678, Development of Intelligent Sensor Platform Technology for Connected Sensor) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) and The work reported in this paper was conducted during the sabbatical year of Kwangwoon University in 2024.

Data Availability Statement

Data are confidential and can be made available on request from corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Akanyeti, O.; Nehmzow, U.; Billings, S.A. Robot training using system identification. Robot. Auton. Syst. 2008, 56, 1027–1041. [Google Scholar] [CrossRef]
Mompó Alepuz, A.; Papageorgiou, D.; Tolu, S. Brain-inspired biomimetic robot control: A review. Front. Neurorobot. 2024, 18, 1395617. [Google Scholar] [CrossRef] [PubMed]
Gulletta, G.; Erlhagen, W.; Bicho, E. Human-like arm motion generation: A review. Robotics 2020, 9, 102. [Google Scholar] [CrossRef]
Shim, S.; Kim, J.Y.; Hwang, S.W.; Oh, J.M.; Kim, B.K.; Park, J.H.; Hyun, D.J.; Lee, H. A comprehensive review of cyber-physical system (CPS)-based approaches to robot services. IEIE Trans. Smart Process. Comput. 2024, 13, 69–80. [Google Scholar] [CrossRef]
Kim, S.; Kim, C.; Park, J.H. Human-like arm motion generation for humanoid robots using motion capture database. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006; pp. 3486–3491. [Google Scholar]
Lee, K.; Shin, U.; Lee, B.U. Learning to Control Camera Exposure via Reinforcement Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2975–2983. [Google Scholar]
Schabron, B.; Reust, A.; Desai, J.; Yihun, Y. Integration of forearm sEMG signals with IMU sensors for trajectory planning and control of assistive robotic arm. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 5274–5277. [Google Scholar]
Ghadge, K.; More, S.; Gaikwad, P.; Chillal, S. Robotic arm for pick and place application. Int. J. Mech. Eng. Technol. 2018, 9, 125–133. [Google Scholar]
Shih, C.L.; Lee, Y. A simple robotic eye-in-hand camera positioning and alignment control method based on parallelogram features. Robotics 2018, 7, 31. [Google Scholar] [CrossRef]
Marwan, Q.M.; Chua, S.C.; Kwek, L.C. Comprehensive review on reaching and grasping of objects in robotics. Robotica 2021, 39, 1849–1882. [Google Scholar] [CrossRef]
Kucuk, S.; Bingul, Z. Robot Kinematics: Forward and Inverse Kinematics; INTECH Open Access Publisher: London, UK, 2006. [Google Scholar]
Slim, M.; Rokbani, N.; Neji, B.; Terres, M.A.; Beyrouthy, T. Inverse kinematic solver based on bat algorithm for robotic arm path planning. Robotics 2023, 12, 38. [Google Scholar] [CrossRef]
Csiszar, A.; Eilers, J.; Verl, A. On solving the inverse kinematics problem using neural networks. In Proceedings of the 2017 24th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), Auckland, New Zealand, 21–23 November 2017; pp. 1–6. [Google Scholar]
Bensadoun, R.; Gur, S.; Blau, N.; Wolf, L. Neural inverse kinematic. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 1787–1797. [Google Scholar]
De Momi, E.; Kranendonk, L.; Valenti, M.; Enayati, N.; Ferrigno, G. A neural network-based approach for trajectory planning in robot—Human handover tasks. Front. Robot. AI 2016, 3, 34. [Google Scholar] [CrossRef]
Gu, S.; Holly, E.; Lillicrap, T.P.; Levine, S. Deep reinforcement learning for robotic manipulation. arXiv 2016, arXiv:1610.00633. [Google Scholar]
D’Souza, A.; Vijayakumar, S.; Schaal, S. Learning inverse kinematics. In Proceedings of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), Maui, HI, USA, 29 October–3 November 2001; Volume 1, pp. 298–303. [Google Scholar]
Zhao, C.; Wei, Y.; Xiao, J.; Sun, Y.; Zhang, D.; Guo, Q.; Yang, J. Inverse kinematics solution and control method of 6-degree-of-freedom manipulator based on deep reinforcement learning. Sci. Rep. 2024, 14, 12467. [Google Scholar] [CrossRef] [PubMed]
Zhong, J.; Wang, T.; Cheng, L. Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics. Complex Intell. Syst. 2022, 8, 1899–1912. [Google Scholar] [CrossRef]
Yu, S.; Tan, G. Inverse Kinematics of a 7-Degree-of-Freedom Robotic Arm Based on Deep Reinforcement Learning and Damped Least Squares. IEEE Access 2024, 13, 4857–4868. [Google Scholar] [CrossRef]
Lin, G.; Huang, P.; Wang, M.; Xu, Y.; Zhang, R.; Zhu, L. An inverse kinematics solution for a series-parallel hybrid banana-harvesting robot based on deep reinforcement learning. Agronomy 2022, 12, 2157. [Google Scholar] [CrossRef]
Yoon, Y.K.; Park, K.H.; Shim, D.W.; Han, S.H.; Lee, J.W.; Jung, M. Robotic-assisted foot and ankle surgery: A review of the present status and the future. Biomed. Eng. Lett. 2023, 13, 571–577. [Google Scholar] [CrossRef]
Lee, Y.-S. Approach to smart mobility intelligent traffic signal system based on distributed deep reinforcement learning. IEIE Trans. Smart Process. Comput. 2024, 13, 89–95. [Google Scholar] [CrossRef]
Mohammed, M.Q.; Chung, K.L.; Chyi, C.S. Review of deep reinforcement learning-based object grasping: Techniques, open challenges, and recommendations. IEEE Access 2020, 8, 178450–178481. [Google Scholar] [CrossRef]
Liu, Y.C.; Huang, C.Y. DDPG-based adaptive robust tracking control for aerial manipulators with decoupling approach. IEEE Trans. Cybern. 2021, 52, 8258–8271. [Google Scholar] [CrossRef]
Dong, Y.; Zou, X. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. In Proceedings of the 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 16–18 October 2020; pp. 52–56. [Google Scholar]
Gong, H.; Wang, P.; Ni, C.; Cheng, N. Efficient path planning for mobile robot based on deep deterministic policy gradient. Sensors 2022, 22, 3579. [Google Scholar] [CrossRef]
Naya, K.; Kutsuzawa, K.; Owaki, D.; Hayashibe, M. Spiking neural network discovers energy-efficient hexapod motion in deep reinforcement learning. IEEE Access 2021, 9, 150345–150354. [Google Scholar] [CrossRef]
Tang, G.; Kumar, N.; Michmizos, K.P. Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 6090–6097. [Google Scholar]
Tang, G.; Kumar, N.; Yoo, R.; Michmizos, K. Deep reinforcement learning with population-coded spiking neural network for continuous control. In Proceedings of the Conference on Robot Learning, PMLR, London, UK, 8–11 November 2021; pp. 2016–2029. [Google Scholar]
Tavanaei, A.; Ghodrati, M.; Kheradpisheh, S.R.; Masquelier, T.; Maida, A. Deep learning in spiking neural networks. Neural Netw. 2019, 111, 47–63. [Google Scholar] [CrossRef] [PubMed]
Tang, G.; Shah, A.; Michmizos, K.P. Spiking neural network on neuromorphic hardware for energy-efficient unidimensional slam. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4176–4181. [Google Scholar]
Yamazaki, K.; Vo-Ho, V.K.; Bulsara, D.; Le, N. Spiking neural networks and their applications: A review. Brain Sci. 2022, 12, 863. [Google Scholar] [CrossRef] [PubMed]
Kim, E.; Kim, Y. Exploring the potential of spiking neural networks in biomedical applications: Advantages, limitations, and future perspectives. Biomed. Eng. Lett. 2024, 14, 1–14. [Google Scholar] [CrossRef] [PubMed]
Balakrishnan, P.; Baskaran, B.; Vivekanan, S.; Gokul, P. Binarized spiking neural networks optimized with color harmony algorithm for liver cancer classification. IEIE Trans. Smart Process. Comput. 2023, 12, 502–510. [Google Scholar] [CrossRef]
Oikonomou, K.M.; Kansizoglou, I.; Gasteratos, A. A framework for active vision-based robot planning using spiking neural networks. In Proceedings of the 2022 30th Mediterranean Conference on Control and Automation (MED), Vouliagmeni, Greece, 28 June–1 July 2022; pp. 867–871. [Google Scholar]
Oikonomou, K.M.; Kansizoglou, I.; Gasteratos, A. A hybrid spiking neural network reinforcement learning agent for energy-efficient object manipulation. Machines 2023, 11, 162. [Google Scholar] [CrossRef]
Oikonomou, K.M.; Kansizoglou, I.; Gasteratos, A. A hybrid reinforcement learning approach with a spiking actor network for efficient robotic arm target reaching. IEEE Robot. Autom. Lett. 2023, 8, 3007–3014. [Google Scholar] [CrossRef]
Dankwa, S.; Zheng, W. Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent. In Proceedings of the 3rd International Conference on Vision, Image and Signal Processing, Vancouver, BC, Canada, 26–28 August 2019; pp. 1–5. [Google Scholar]
Yang, X.; Song, J.; Zhang, X.; Wang, D. Adaptive Spiking TD3+ BC for Offline-to-Online Spiking Reinforcement Learning. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–6. [Google Scholar]
Akl, M.; Ergene, D.; Walter, F.; Knoll, A. Toward robust and scalable deep spiking reinforcement learning. Front. Neurorobot. 2023, 16, 1075647. [Google Scholar] [CrossRef]
Xie, B.; Zhao, J.; Liu, Y. Human-like motion planning for robotic arm system. In Proceedings of the 2011 15th International Conference on Advanced Robotics (ICAR), Tallinn, Estonia, 20–23 June 2011; pp. 88–93. [Google Scholar]
Rosado, J.; Silva, F.; Santos, V.; Lu, Z. Reproduction of human arm movements using Kinect-based motion capture data. In Proceedings of the 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO), Shenzhen, China, 12–14 December 2013; pp. 885–890. [Google Scholar]
Zhao, J.; Xie, B.; Song, C. Generating human-like movements for robotic arms. Mech. Mach. Theory 2014, 81, 107–128. [Google Scholar] [CrossRef]
Shin, S.Y.; Kim, C. Human-like motion generation and control for humanoid’s dual arm object manipulation. IEEE Trans. Ind. Electron. 2014, 62, 2265–2276. [Google Scholar] [CrossRef]
Zhao, J.; Monforte, M.; Indiveri, G.; Bartolozzi, C.; Donati, E. Learning inverse kinematics using neural computational primitives on neuromorphic hardware. NPJ Robot. 2023, 1, 1. [Google Scholar] [CrossRef]
Liu, Z.; Hu, F.; Luo, D.; Wu, X. Learning arm movements of target reaching for humanoid robot. In Proceedings of the 2015 IEEE International Conference on Information and Automation, Lijiang, China, 8–10 August 2015; pp. 707–713. [Google Scholar]
Nobre, F.; Heckman, C. Learning to calibrate: Reinforcement learning for guided calibration of visual–inertial rigs. Int. J. Robot. Res. 2019, 38, 1388–1402. [Google Scholar] [CrossRef]
Ahmed, M.H.; Kutsuzawa, K.; Hayashibe, M. Transhumeral arm reaching motion prediction through deep reinforcement learning-based synthetic motion cloning. Biomimetics 2023, 8, 367. [Google Scholar] [CrossRef] [PubMed]
Bouganis, A.; Shanahan, M. Training a spiking neural network to control a 4-dof robotic arm based on spike timing-dependent plasticity. In Proceedings of the The 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
Chen, X.; Zhu, W.; Dai, Y.; Ren, Q. A bio-inspired spiking neural network for control of a 4-dof robotic arm. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; pp. 616–621. [Google Scholar]
Tieck, J.C.V.; Steffen, L.; Kaiser, J.; Roennau, A.; Dillmann, R. Controlling a robot arm for target reaching without planning using spiking neurons. In Proceedings of the 2018 IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), Berkeley, CA, USA, 16–18 July 2018; pp. 111–116. [Google Scholar]
Hulea, M.; Uleru, G.I.; Caruntu, C.F. Adaptive SNN for anthropomorphic finger control. Sensors 2021, 21, 2730. [Google Scholar] [CrossRef]
Volinski, A.; Zaidel, Y.; Shalumov, A.; DeWolf, T.; Supic, L.; Tsur, E.E. Data-driven artificial and spiking neural networks for inverse kinematics in neurorobotics. Patterns 2022, 3, 100391. [Google Scholar] [CrossRef]
Zahra, O.; Tolu, S.; Navarro-Alarcon, D. Differential mapping spiking neural network for sensor-based robot control. Bioinspir. Biomimetics 2021, 16, 036008. [Google Scholar] [CrossRef]
Polykretis, I.; Supic, L.; Danielescu, A. Bioinspired smooth neuromorphic control for robotic arms. Neuromorphic Comput. Eng. 2023, 3, 014013. [Google Scholar] [CrossRef]
Tieck, J.C.V.; Becker, P.; Kaiser, J.; Peric, I.; Akl, M.; Reichard, D.; Roennau, A.; Dillmann, R. Learning target reaching motions with a robotic arm using brain-inspired dopamine modulated STDP. In Proceedings of the 2019 IEEE 18th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), Milan, Italy, 23–25 July 2019; pp. 54–61. [Google Scholar]
Juarez-Lora, A.; Ponce-Ponce, V.H.; Sossa, H.; Rubio-Espino, E. R-STDP spiking neural network architecture for motion control on a changing friction joint robotic arm. Front. Neurorobot. 2022, 16, 904017. [Google Scholar] [CrossRef]
Neftci, E.O.; Mostafa, H.; Zenke, F. Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process. Mag. 2019, 36, 51–63. [Google Scholar] [CrossRef]
Matheron, G.; Perrin, N.; Sigaud, O. The problem with DDPG: Understanding failures in deterministic environments with sparse rewards. arXiv 2019, arXiv:1911.11679. [Google Scholar]
Brockman, G. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Towers, M.; Kwiatkowski, A.; Terry, J.; Balis, J.U.; De Cola, G.; Deleu, T.; Goulão, M.; Kallinteris, A.; Krimmel, M.; KG, A.; et al. Gymnasium: A standard interface for reinforcement learning environments. arXiv 2024, arXiv:2407.17032. [Google Scholar]
Zenke, F. SpyTorch (v0.3). 2019. Available online: https://zenodo.org/records/3724018 (accessed on 27 January 2025).
Todorov, E.; Erez, T.; Tassa, Y. Mujoco: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar]
Pérez-Carrasco, J.A.; Zhao, B.; Serrano, C.; Acha, B.; Serrano-Gotarredona, T.; Chen, S.; Linares-Barranco, B. Mapping from frame-driven to frame-free event-driven vision systems by low-rate rate coding and coincidence processing–application to feedforward ConvNets. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2706–2719. [Google Scholar] [CrossRef] [PubMed]
Akl, M.; Sandamirskaya, Y.; Ergene, D.; Walter, F.; Knoll, A. Fine-tuning deep reinforcement learning policies with r-stdp for domain adaptation. In Proceedings of the International Conference on Neuromorphic Systems 2022, Knoxville, TN, USA, 27–29 July 2022; pp. 1–8. [Google Scholar]
Burkitt, A.N. A review of the integrate-and-fire neuron model: I. Homogeneous synaptic input. Biol. Cybern. 2006, 95, 1–19. [Google Scholar] [CrossRef] [PubMed]
Eschmann, J. Reward function design in reinforcement learning. In Reinforcement Learning Algorithms: Analysis and Applications; Springer: Cham, Switzerland, 2021; pp. 25–33. [Google Scholar]
Han, R.; Chen, S.; Wang, S.; Zhang, Z.; Gao, R.; Hao, Q.; Pan, J. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards. IEEE Robot. Autom. Lett. 2022, 7, 5896–5903. [Google Scholar] [CrossRef]
Xie, J.; Shao, Z.; Li, Y.; Guan, Y.; Tan, J. Deep reinforcement learning with optimized reward functions for robotic trajectory planning. IEEE Access 2019, 7, 105669–105679. [Google Scholar] [CrossRef]
Cahill, A. Catastrophic Forgetting in Reinforcement-Learning Environments. Ph.D. Thesis, University of Otago, Dunedin, New Zealand, 2011. [Google Scholar]
Muhammad, U.; Sipra, K.A.; Waqas, M.; Tu, S. Applications of myo armband using EMG and IMU signals. In Proceedings of the 2020 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA), Shanghai, China, 16–18 October 2020; pp. 6–11. [Google Scholar]
Gui, P.; Tang, L.; Mukhopadhyay, S. MEMS based IMU for tilting measurement: Comparison of complementary and kalman filter based data fusion. In Proceedings of the 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), Auckland, New Zealand, 15–17 June 2015; pp. 2004–2009. [Google Scholar]
Alatise, M.B.; Hancke, G.P. Pose estimation of a mobile robot based on fusion of IMU data and vision data using an extended Kalman filter. Sensors 2017, 17, 2164. [Google Scholar] [CrossRef]
El-Gohary, M.; McNames, J. Human joint angle estimation with inertial sensors and validation with a robot arm. IEEE Trans. Biomed. Eng. 2015, 62, 1759–1767. [Google Scholar] [CrossRef]

Figure 1. Structure of the reinforcement learning environment space including a 3-DoF robotic arm and target point. (a) The default Reacher-v4 environment in Mujoco. (b) The customized 3-DoF robotic arm environment we designed. (c) Joint-level rotations of the customized robot arm environment in Mujoco. (d) Coordinate systems and links of the customized 3-DoF robotic arm environment.

Figure 2. SNN−based TD3 Architecture.

Figure 3. Illustration of the simulation environment and the input state values, consisting of 18 elements. (a) Six values representing the cosine and sine of three joints and their angular velocities. (b) Three target coordinates. (c) A distance vector between the fingertip and the target coordinates. (d) Two flags for ground and link collisions, and one for target reach status.

Figure 4. The agent’s actions for the target-reaching tasks during the training process: (a) an example of the failed target-reaching task, and (b) an example of the successful target-reaching attempts.

Figure 5. Averaged reward achieved by the agent during the training episodes.

Figure 6. A robotic arm mimicking human arm movement in the testing.

Table 1. The performances of the target-reaching tasks.

	Success Rate	Execution Time (ms)
Training Performance	$0.95 \pm 0.043$	$0.224 \pm 0.062$
Testing Performance	$0.95 \pm 0.037$	$0.231 \pm 0.054$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, Y.; Lee, J.; Sim, D.; Cho, Y.; Park, C. Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications. Electronics 2025, 14, 578. https://doi.org/10.3390/electronics14030578

AMA Style

Park Y, Lee J, Sim D, Cho Y, Park C. Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications. Electronics. 2025; 14(3):578. https://doi.org/10.3390/electronics14030578

Chicago/Turabian Style

Park, Yuntae, Jiwoon Lee, Donggyu Sim, Youngho Cho, and Cheolsoo Park. 2025. "Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications" Electronics 14, no. 3: 578. https://doi.org/10.3390/electronics14030578

APA Style

Park, Y., Lee, J., Sim, D., Cho, Y., & Park, C. (2025). Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications. Electronics, 14(3), 578. https://doi.org/10.3390/electronics14030578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Designing Spiking Neural Network-Based Reinforcement Learning for 3D Robotic Arm Applications

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Robotic Arm Design

3.2. Simulation Environment

3.3. Reward Function Design for Target Reaching

4. Experimental Results

4.1. Experimental Settings

4.2. Training and Testing Results for the Target-Reaching Task

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI