Next Article in Journal
Dual-Tower TTP Semantic Matching Method Based on Soft–Hard Label Supervision and Gated Binary Interaction
Previous Article in Journal
A Design of High-Precision and Low-Noise High-Current Power Amplifier
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI

Department of Semiconductor Systems Engineering, Sejong University, Seoul 05006, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(24), 4959; https://doi.org/10.3390/electronics14244959
Submission received: 6 November 2025 / Revised: 11 December 2025 / Accepted: 15 December 2025 / Published: 17 December 2025

Abstract

Neuromorphic computing has emerged as a promising paradigm for edge AI systems owing to its event-driven operation and high energy efficiency. However, conventional spiking neural network (SNN) architectures often suffer from redundant computation and inefficient power control, particularly during on-chip learning. This paper proposes a network-on-chip (NoC) architecture featuring a unified refractory-enabled neuron (UREN)-based router that globally coordinates spike-driven computation across multiple neuron cores. The router applies a unified refractory time to all neurons following a winner spike event, effectively enabling clock gating and suppressing redundant activity. The proposed design adopts a star-routing topology with multicasting support and integrates nearest-neighbor spike-timing-dependent plasticity (STDP) for local online learning. FPGA-based experiments demonstrate a 30% reduction in computation and 86.1% online classification accuracy on the MNIST dataset compared with baseline SNN implementations. These results confirm that the UREN-based router provides a scalable and power-efficient neuromorphic processor architecture, well suited for energy-constrained edge AI applications.

1. Introduction

Neuromorphic computing, based on a spiking neural network (SNN), has garnered significant attention in the field of AI hardware accelerators. This is because of its energy-efficient and biologically inspired computational paradigm [1]. Unlike conventional Von Neumann architectures, neuromorphic processors emulate the spiking behavior of biological neurons, enabling event-driven computation [2]. They have been widely adopted in numerous application domains, including edge AI [3,4,5], sensor data processing [6], and robotics [7], among others [8], due to the advantages of such power-efficient neuromorphic processors.
Recently, neuromorphic processors have increasingly incorporated various biologically inspired synaptic plasticity mechanisms and algorithmic learning techniques to support on-chip learning. Representative examples include spike-timing-dependent plasticity (STDP) [9,10] and back-propagation through time (BPTT) [11], each with distinct learning-supervision requirements and hardware implementation complexity. While research on learning algorithms has been actively pursued, studies addressing power efficiency and system-level control mechanisms during learning execution remain relatively limited [12].
In particular, single-layer SNNs face performance limitations as the number of neurons and synapses increases, which can result in memory access bottlenecks, inter-neuron communication overhead, and reduced computational throughput [13]. These limitations present practical challenges for energy-constrained edge device applications, where efficiency is crucial.
Traditional on-chip communication methods, such as bus-based or point-to-point connections, face scalability limitations when handling millions of inter-neuron connections in neuromorphic systems [14]. To this end, the Address Event Representation (AER) scheme assigns a unique address to each neuron and encodes spike events as packets transmitted to their respective destinations. Although this architecture is advantageous due to its simplicity and scalability, it exhibits inherent structural limitations when considering the characteristics of fully connected neural networks, such as those found in SNNs.
  • Hop distance overhead: In a 2D mesh topology, as the physical distance between neurons increases, spike packets must traverse more intermediate routers, leading to an increase in the number of hops [15]. As the hop count grows, data transmission latency increases, and energy consumption accumulates at each hop, raising the likelihood of network congestion. These structural inefficiencies become particularly problematic in SNNs, where a large number of spike events occur within short time windows, significantly degrading performance in real-time, energy-constrained edge device environments.
  • Topology mismatch: Most neuromorphic processors utilizing STDP are structured as fully connected single-layer networks. In such architectures, each neuron is connected to all others, resulting in frequent non-local communications. However, the 2D mesh topology is optimized for local, spatially adjacent communication. Consequently, when large-scale communication between distant neurons is repeated—as in fully connected structures—this leads to increased latency and energy overhead.
  • Lack of NoC-level control: Most neuromorphic NoC designs focus primarily on routing efficiency and scalability. Few architectures manage neuron state-synchronized control signals at the network level. In reality, many neurons remain inactive during computation cycles [16], yet still receive unnecessary clock signals and maintain their state, leading to power waste. This lack of fine-grained computation control results in energy inefficiencies, especially detrimental in low-power edge AI scenarios.
To address these limitations, this paper proposes a neuromorphic processor architecture that incorporates a star routing topology–based NoC. In the proposed design, spike data in packet form is sent to a central router equipped with multicasting capability, which replicates the spike packet and distributes it to all connected neuron cores, effectively implementing a fully connected network (Figure 1). Furthermore, a Winner-Take-All (WTA) mechanism is employed to detect spike activations. When a neuron fires, a unified refractory time is applied across all neurons, and this process is directly controlled at the router level. Unlike conventional designs where refractory behavior and clock-gating logic reside inside each neuron or core, the proposed approach elevates these mechanisms to the NoC layer, allowing the router to orchestrate computation timing across the entire SNN.
The selection of a star topology reflects the requirements of fully connected, single-layer online STDP. Prior studies show that mesh-based NoCs incur increasing hop delay and energy as system size grows [17]. In contrast, the star configuration provides deterministic one-hop multicast and minimal routing delay—properties beneficial for edge-scale deployments where tightly coupled neuron cores interact frequently. By aligning communication structure with learning workload, the proposed design reduces network-level overhead while simplifying router logic.
Through this combination of unified refractory control and low-diameter routing, the architecture achieves system-wide suppression of redundant computation, efficient event-driven operation, and scalable performance for on-chip online learning. The remainder of this paper is organized as follows. Section 2 introduces the neuron model and learning rule, along with implementation-level optimizations for efficient hardware deployment. Section 3 describes the proposed hardware architecture in detail. Section 4 presents experimental results that validate the effectiveness of the proposed system. Finally, Section 5 concludes the paper with a summary and discussion of future directions.

2. Neuron Model and Learning Rule

2.1. Neuron Model

Among the many neuron models proposed for SNNs ranging from the biologically detailed Hodgkin–Huxley model to simplified models like QIF, AdEx, and Izhikevich—the Leaky Integrate-and-Fire (LIF) model is the most widely used in hardware implementations due to its simplicity and efficiency [18].
In this paper, the neuron model adopts the leaky integrate-and-fire (LIF) model, which is the simplest form and well-suited for hardware implementation [19]. In the LIF model, each neuron updates its membrane potential at every time step based on the input spike, the synaptic weight connected to the input, and the leak constant λ . The neuron fires and generates an output spike when the updated membrane potential exceeds the threshold. Equation (1) describes the membrane potential update, while Equation (2) represents the firing mechanism when the membrane potential surpasses the threshold.
U j [ t ] = U reset , if ( t t j fire ) < T ref U j [ t 1 ] + i W i j S i [ t ] λ , otherwise .
S j [ t ] = θ ( U j [ t ] U t h )
A neuron that fires at time t is reset to its initial value and maintains this state for the refractory period T ref , remaining unresponsive to further inputs during that window owing to its bio-plausible properties. Additionally, utilizing the zero-skip method can prevent unnecessary hardware resource consumption since spikes are discrete binary data, thereby providing a significant advantage.
As the network operates under an unsupervised learning approach, it is crucial to establish competitive dynamics among neurons to determine which neuron responds most actively to a given input; this effectively enables implicit input labeling.
Lateral inhibition facilitates this process by enhancing the firing activity of neurons that are highly responsive to specific input features; this is achieved by modulating the reset membrane potential: neurons that frequently spike are hyperpolarized less (i.e., given a higher reset potential), allowing them to fire more easily in subsequent time steps. Conversely, non-spiking neurons are inhibited by assigning a lower reset membrane potential, making them less likely to spike in response to future inputs.
To effectively apply lateral inhibition, the network must distinguish neurons with strong activation from those with weak responses. The winner-take-all (WTA) mechanism identifies the most responsive neuron at each time step by comparing membrane potentials across all neurons receiving the same input. The neuron with the highest membrane potential is designated the winner, assuming that it has stronger synaptic connectivity and better captures the input feature.
Once a winner neuron is selected and its membrane potential exceeds the threshold, it generates a spike and receives less inhibition in the next time step (i.e., a favorable reset state), thereby reinforcing its responsiveness. Conversely, non-winner neurons are suppressed through stronger inhibition, making it more challenging for them to spike in subsequent steps. Only the winner neuron is allowed to update its synaptic weights, effectively encoding the input feature through competition-based learning. The winning neuron is then indexed for use during the inference phase, since it is considered to have best captured the input characteristics. Figure 2 illustrates how lateral inhibition and WTA enhances the firing activity of neurons that are highly responsive to specific input features.
In conventional neuromorphic systems, when a neuron spikes, only that neuron enters a refractory period while others continue integration. This behavior improves biological fidelity and reduces switching of the spiking neuron, but it also leaves many non-firing neurons repeatedly performing useless LIF updates over the time window. To address this, we introduce a unified refractory-enabled router (UREN): once the winner neuron fires, the router asserts a global refractory mask so that all neuron updates are synchronously halted for a fixed window, thereby suppressing redundant computation and mitigating overfitting due to excessive weight potentiation.
Figure 3 contrasts the baseline neuron-level refractory with the proposed unified refractory timing.
R [ t ] = 1 , t fire t < t fire + T ref , 0 , otherwise ,
U j [ t ] = ( 1 R [ t ] ) U j [ t 1 ] + i W i j S i [ t ] λ + R [ t ] U reset ,

2.2. Learning Rule

A key characteristic of information transmission in SNNs is its temporal nature. Input spikes influence the membrane potential at each time step; additionally, output spikes occur at distinct time steps. By leveraging this temporal information, synaptic weights between neurons are updated in real-time at each time step using spike-timing-dependent plasticity (STDP)—a learning mechanism based on spike timing. When a pre-synaptic (input) spike occurs before a post-synaptic (output) spike, it is referred to as long-term potentiation (LTP), leading to an increase in synaptic weight. Conversely, when a post-synaptic spike occurs before a pre-synaptic spike, it is referred to as long-term depression (LTD), leading to a decrease in synaptic weight. Equations (5) and (6) describe the weight changes due to LTP and LTD, respectively, where η represents the learning rate, and A + , A , τ + , and τ are STDP learning hyperparameters.
If t p o s t > t p r e , then LTP:
W i j = W i j + η A + e Δ t / τ +
If t p o s t < t p r e , then LTD:
W i j = W i j η A e Δ t / τ
While conventional STDP offers biologically plausible learning by considering all spike pairs across the entire time window, its hardware implementation is resource-intensive. Specifically, computing Δ t for every pre- and post-synaptic spike pair requires significant memory and computational overhead, especially in high spike-rate scenarios. This results in poor scalability and increased power consumption, which are critical limitations in edge AI environments.
To address this, the proposed architecture adopts a more hardware-friendly variant—Nearest STDP. Instead of calculating weight updates for all spike pairs, Nearest STDP considers only the pre-synaptic spike that is temporally closest to each post-synaptic spike [20]. As illustrated in Figure 4, this significantly reduces memory requirements, as only the most recent pre-synaptic spike needs to be retained. By leveraging a shift register to track recent spikes and compute relative delays, this method achieves substantial hardware efficiency compared to conventional STDP.
Although Nearest STDP introduces a trade-off, such as the potential loss of historical spike information and reduced capacity to learn complex temporal patterns, this is an acceptable compromise in the context of edge AI systems, where real-time responsiveness, power efficiency, and low-resource operation are paramount.

3. Hardware Architecture

3.1. System Architecture Overview

Figure 5 illustrates the overall system architecture of the NoC with the neuron cores and the unified refractory-enabled neurons(UREN) router scheme. A spike from a specific input neuron is propagated to all output neurons through their respective synapses because of the fully connected nature of the target SNN. We adopt a star routing NoC architecture to support this dense connectivity while ensuring hardware scalability as the number of neurons increases. Spike packets arriving from an external host or connected cores are multicast to all neuron cores through the router. In contrast to mesh-based NoCs optimized for spatial locality, the proposed edge-oriented architecture benefits from deterministic one-hop multicast and reduced router count, which aligns with the fully connected firing patterns of single-layer online STDP networks.
Conventional 2D mesh-based NoCs deliver spike packets by attaching a destination index and transferring the packet across multiple hops [21]. This approach is suitable for large-scale neuromorphic systems owing to its high scalability; however, it suffers from increased latency as the hop count grows and consumes more hardware resources as the number of routers increases, particularly due to the use of LUTs.
In contrast, the star routing NoC limits the number of connectable cores, which places some constraints on scalability. However, it offers reduced hardware overhead due to its simplified routing logic. Although multicasting can induce latency and power overhead when input spikes arrive excessively within a short time window, this is unlikely to occur in the SNN architecture coded for the target rate, where the spike rate is typically around 20%, leading to inherently sparse spike traffic. The router detects spike events from the result packet and applies unified refractory control across all neuron cores; this prevents unnecessary input and computation within each neuron during the refractory period via clock gating, thereby reducing dynamic power consumption. Given that hardware footprint and energy efficiency are crucial in edge AI applications, the proposed router scheme was selected to balance this trade-off in the inter-core communication architecture.
Incoming filtered spike packets from external sources or other cores are stored in internal FIFO buffers. Utilizing FIFO buffers enables the router to process incoming packets continuously with reduced write latency, even under bursty traffic conditions. The arbiter monitors the occupancy of the FIFO and—if full—triggers a stall signal to the external network interface using a handshake protocol to prevent overflow and associated overhead. Each packet is decoded sequentially by the packet decoder. Based on the decoded neuron mask bits, the router dynamically connects clock signals only to the targeted neuron cores while disabling clock delivery to idle cores via clock gating. This event-driven operation ensures that only the necessary cores are activated, effectively minimizing idle power consumption.
A unified refractory and core mask enable block is integrated into the router to support global computation control. It receives a local result packet from each core and selects the global winner neuron core. If the global winner’s potential exceeds the spiking threshold, the unified refractory timer is triggered to manage clock gating across all cores. During the refractory window, only the winner core continues weight updates, while all other cores remain idle with their clocks gated. Meanwhile, the router continues to receive spike packets from the host, and the refractory module monitors the time step within each packet to count down the refractory period. Once the unified refractory time elapses, clock signals are re-enabled across all cores, resuming neuron operations.
This unified control strategy not only simplifies the refractory logic by centralizing control at the router level, but also minimizes switching activity across the entire system. Such fine-grained, dynamic power gating is particularly advantageous in edge AI scenarios where energy efficiency and compact hardware design are crucial requirements.

3.2. Network Interface

Figure 6 illustrates the structure of the Network Interface (NI) and the spike packet format. The NI serves as a communication bridge between the router and the neuron cluster, ensuring data integrity through a handshake interface. Internally, the NI is composed of a decoder that converts incoming spike packets into input data for neurons, an encoder that formats output spikes and other computation results into packets for the host, and a packet buffer for intermediate storage.
The NI encodes the spike event of the winner neuron along with its labeled ID into a packet via the encoder and transmits it back to the router. This mechanism enables the Unified Refractory Time control system to function in a network-synchronized manner.
Upon receiving an input spike packet from the router, the NI decodes the contents. The MSB of the packet represents the mode bit: when set to 1, on-chip learning is enabled; when set to 0, the system performs inference by executing LIF computations without updating synaptic weights. In learning mode, spike events are stored in a register upon firing, and the STDP module subsequently utilizes this information to update synaptic weights based on the relative spike timing.
Additionally, each packet includes a time step field, eliminating the need for a separate counter within the neuron core. When a spike occurs among the 512 time-multiplexed neurons, the corresponding neuron index is included as a Neuron ID. Spike data are organized into 16-bit batches, and the packet also contains a batch count field to enable rapid delivery of all spike events within a single time step.

3.3. Neuron Core

Figure 7 illustrates the block diagram of the neuron core and the LIF neuron cell. Each neuron core consists of a neuron cell capable of performing 512 time-multiplexed LIF computations, a lateral inhibition block, a learning block responsible for weight updates, an addressing crossbar that receives spike packets from the NI and generates the corresponding addresses for SRAM access, a weight SRAM storing synaptic weights for the 512 neurons connected to 256 input neurons, and a neuron SRAM storing neuron-specific parameters.
The spike packets received from the NI are forwarded to the addressing crossbar. This block features a crossbar-style interconnect that translates each input spike index into corresponding memory addresses, which are then sent to both the weight SRAM and neuron SRAM. Each LIF neuron cell fetches the synaptic weight from the weight SRAM and the membrane potential from the neuron SRAM. The membrane potential is then updated using an accumulator, followed by the subtraction of the leak value to perform the LIF operation. The updated potential is compared against a predefined threshold stored in the neuron SRAM to determine whether the neuron fires. If a spike is generated, the spike generator encodes the neuron index and spike information, which is sent to the lateral inhibition block to determine the winning neuron. The LIF neuron cell is implemented in a highly simplified form to minimize hardware resource usage.
The control unit manages neuron operations based on the current time step and operating mode, while coordinating with the NI to avoid data conflicts. Additionally, when a spike is generated, the corresponding information is sent back to the NI for packet encoding and transmission.
Figure 8 shows the block diagram of the hardware implementation of the nearest STDP learning algorithm. The decoded pre-synaptic spike information from the Network Interface is stored in a shift register that maintains spike history for up to 5 previous time steps relative to the current time step. As each time step progresses, the shift register discards the oldest spike data and stores the most recent spike event.
When a spike is generated from the winner neuron, its spike time is captured. The learning block then compares the most recent spike time of each input neuron (pre-synaptic) with the post-synaptic spike time to compute the temporal difference ( Δ t ). Based on this time difference, a LUT is used to determine the corresponding weight update value ( Δ w ).
The current synaptic weight is retrieved from the weight SRAM and updated by adding Δ w . To prevent weight values from diverging excessively or diminishing too much—a known behavior in STDP—the updated weight is normalized using predefined minimum and maximum bounds. The new weight is then written back into the weight SRAM, completing the STDP learning update. By employing the nearest STDP rule, the system significantly reduces memory usage as it does not need to store spike data for all time steps. Additionally, by applying the WTA mechanism, only the synaptic weights connected to the winner neuron are updated, further reducing the overall computational load.

4. Experimental Results

4.1. Experimental Setup

This section describes the experimental setup used to evaluate the UREN-based neuromorphic processor in terms of accuracy and computational efficiency. The proposed architecture was verified using both MATLAB R2024b- and ModelSim 2020.4-based methodologies. First, the RTL design was implemented in Verilog HDL and synthesized for FPGA and ASIC targets to assess processing speed and power efficiency. Subsequently, MATLAB simulations were conducted to quantify the computation reduction achieved by the Unified Refractory Time mechanism relative to a baseline architecture. The hyperparameters used in all simulations are summarized in Table 1. In every experiment, neuron-state parameters (e.g., membrane potential) were represented with 16-bit precision, while synaptic weights were quantized to 8 bits. The STDP hyperparameters and weight bounds follow commonly adopted values for online, single-layer STDP networks, ensuring stable competition while preventing unbounded weight growth. Weight normalization is applied after each update to maintain long-term numerical stability.
To further analyze system behavior, a MATLAB-based SNN simulator was developed to measure both neuron-operation counts and spike activity. The simulator configuration followed the parameters listed in Table 1. A total of 60,000 training and 10,000 test images from the MNIST dataset were encoded into spike trains via rate coding. Each image was mapped to a 16 × 16 input-neuron array, and the output layer consisted of 4096 neurons distributed across eight cores (512 neurons per core). Each simulation ran for 350 time steps with synaptic weights initialized uniformly within the range [ 0 , 0.1 ] . Learning-related parameters (weight limits, learning rate, and STDP constants) and LIF neuron parameters (leak rate, lateral-inhibition thresholds, refractory time, and firing threshold) followed Table 1. Using this configuration, spike rates and neuron-operation counts were recorded for each architecture and used for comparative analysis. Since the proposed architecture targets unsupervised, single-layer online STDP for edge inference, the experiments are designed to evaluate computational efficiency and scalability under fixed learning rules. Accordingly, the experiments focus on evaluating computational efficiency and scalability under fixed single-layer STDP learning rules, rather than optimizing absolute accuracy or comparing against deep multi-layer SNNs.
For hardware validation, the proposed processor incorporating the unified refractory neuron (UREN) clock-enable scheme and a star-routing NoC was prototyped on a Xilinx VMK180 evaluation board (device: XCVM1802-2MSEVSVA2197). The system was synthesized and implemented using Xilinx Vivado 2023.1 with a target operating frequency of 100 MHz, while an Arm Cortex-R5 processor was employed as the RPU. The FPGA resource utilization is summarized in Table 2. The reported 3.3 W FPGA power primarily reflects system-level activity, including I/O and global clock distribution, and therefore does not represent edge-level power efficiency. In this work, the FPGA prototype is used mainly to validate the full multi-core architecture, UREN routing behavior, and real-time network-level coordination under hardware execution.

4.2. Performance Evaluation

Figure 9a compares the neuron operations and spike rates under three configurations: a baseline without any refractory mechanism, a neuron-level refractory model in which each neuron applies an individual refractory period upon spiking, and the proposed UREN-based architecture. In the baseline configuration, approximately 1.4 M neuron operations were recorded with an average spike rate of 20.29%. Under the neuron-level refractory control, the number of neuron operations was reduced to approximately 245 K, with a spike rate of 5.71%. Conversely, the proposed UREN-based architecture achieved only 172 K neuron operations with a spike rate of 6.29These findings demonstrate that the proposed architecture achieves an 80% reduction in neuron operations compared to the baseline. Although the spike rate of the UREN model is slightly higher than that of the neuron-level refractory control, the total number of neuron operations is approximately 30% lower. This is because the UREN router suppresses unnecessary computations at the network level through global clock gating, resulting in improved computational efficiency even when spike activity remains relatively high.
It should be emphasized that the operation-reduction evaluation is performed using baseline configurations that are widely adopted in single-layer online-STDP systems. These include both a standard implementation without refractory control and a model that applies conventional neuron-level refractory behavior. Since existing neuromorphic processors do not provide a directly comparable network-level refractory mechanism, these baselines represent the most appropriate and standardized reference points rather than arbitrary internal choices. Unlike conventional neuron-level approaches that suppress spike generation directly, the UREN mechanism focuses on controlling the actual timing of computation. This separation of spike activity from computation overhead is a key advantage for dynamic power optimization in event-driven edge AI processors. Additionally, the proposed scheme introduces only minimal hardware overhead at the router level, enabling simple yet effective event-driven optimization through clock gating.
As the number of image classes or the complexity of input patterns increases, unsupervised STDP-based neuromorphic processors require exponentially more neurons and synapses to maintain high classification accuracy. Conventional neuromorphic systems with a fixed number of on-chip neurons often fall short of meeting such scalability demands. To address this limitation, the proposed architecture adopts a star routing NoC that supports neuron-level scalability through core expansion. Figure 9b presents the classification accuracy and total neuron operations as the number of cores increases from one (512 neurons) to eight (4096 neurons). While the neuron operations increase linearly with the number of cores, the classification accuracy remains consistently stable across all configurations. At 4096 neurons, the measured classification accuracy was 86.1%. The resulting accuracy values fall within the expected range for single-layer, rate-coded online STDP networks. The key observation is that the proposed UREN-based coordination maintains stable learning behavior as the number of cores increases, rather than aiming to match higher accuracy of deep or offline-trained SNNs. These results validate that the proposed UREN-based router architecture achieves both high computational efficiency and scalable performance.
To demonstrate the scalability of the proposed architecture, we implemented each local block as a functional module in a MATLAB simulation and evaluated image classification performance by varying the number of neurons. As shown in Figure 10, the classification accuracy consistently improves across all datasets—including basic image datasets such as MNIST and N-MNIST, as well as more complex datasets like Fashion MNIST and EMNIST—when the number of neurons increases. Specifically, the proposed system achieved 89.4% accuracy on MNIST and 80.79% on N-MNIST using 6400 neurons. In contrast, Fashion MNIST and EMNIST showed relatively lower accuracy due to higher visual similarity and complexity among classes. Although the absolute accuracy for some datasets remained modest, all experiments consistently showed an upward trend in accuracy as the number of neurons increased. This supports the architectural scalability and adaptability of the proposed system. Furthermore, the results indicate that the architecture can flexibly accommodate datasets with varying degrees of complexity.
Furthermore, to validate the consistency between the software model and the FPGA-based hardware implementation, the learning algorithm was also implemented in C using the same parameters as the MATLAB simulation. Figure 11a compares the learning runtime between the software (SW) and hardware (HW) implementations as the number of output neurons increases. While the SW runtime increases quadratically due to sequential weight updates, the HW implementation—composed of eight 512-neuron cores operating in parallel—maintains nearly constant latency, achieving up to 12× faster computation at 4096 neurons. To verify weight retention accuracy, the trained weights from the hardware implementation were compared with those obtained from MATLAB after training on the MNIST dataset. The histogram of squared error in Figure 11b shows that most weight deviations remain below 1 × 10 6 , demonstrating excellent fixed-point fidelity in the proposed architecture.

4.3. MPW Fabrication Results

The proposed neuromorphic processor was designed as a multi-core architecture interconnected through a UREN-based router; however, for silicon validation, a single-core prototype was fabricated to evaluate core-level energy efficiency. The chip was implemented in a fully digital 28 nm FD-SOI process. Figure 12 shows the layout image and key chip specifications. The single core operates at 1.1 V and 100 MHz, consuming 1.15 mW of power. During learning at 100 MHz, it achieves an energy efficiency of 22.95 pJ/SOP. In this work, one Synaptic Operation(SOP) is defined as the elementary synapse-level event that includes a weight fetch, membrane update, and—when applicable—its STDP-based weight adjustment. This definition is consistent with that used in prior neuromorphic processors such as the SC14 TrueNorth system [22], where energy efficiency is expressed in terms of synaptic operations per second (SOP/s) and per joule (SOP/J). Adopting this standardized definition enables meaningful pJ/SOP comparison across neuromorphic architectures. The layout integrates high-density SRAM blocks for synaptic weight storage and dedicated registers for parameter updates, achieving optimized area and power efficiency at the core level.

4.4. Comparative Analysis

The comparison in Table 3 focuses on neuromorphic processors for which the network-on-chip organization, learning rule, and architectural configuration are explicitly documented and structurally compatible with single-layer online-STDP models. This ensures that the comparison reflects systems operating under comparable architectural assumptions and provides a consistent basis for evaluating the impact of the proposed UREN-based control and routing scheme. Large-scale neuromorphic platforms such as Loihi [23] and DYNAP-SE [5] pursue different design objectives (e.g., multi-layer networks, programmable learning rules, and large mesh-based fabrics) and therefore are not included in the quantitative table, but they represent complementary neuromorphic directions to which the proposed UREN-based NoC concept could potentially be extended. Unlike earlier mesh-based designs that employ local refractory control, the proposed architecture integrates a UREN-based router featuring unified refractory timing, enabling global clock gating with minimal control overhead. Rather than targeting high absolute accuracy, the proposed system maintains a reasonable level of online-learning accuracy within the expected range for single-layer STDP networks, while significantly reducing neuron operations through centralized refractory coordination. Furthermore, the star-topology NoC adopted in this work supports multicast communication with a single-hop delay, minimizing transmission latency and enhancing real-time suitability for edge AI applications.

5. Conclusions

In this paper, we proposed a novel NoC architecture with a neuron-based unified refractory-enabled router scheme for low-power SNN systems targeting edge AI applications. The proposed architecture enables global clock gating across all neuron cores by adopting a unified refractory control mechanism at the router level, thereby significantly reducing the redundant computations typically incurred by local refractory logic. The system also incorporates multicasting in a star routing topology to further reduce communication overhead. Experimental results demonstrated that the proposed UREN-based design can maintain reasonable online-learning classification accuracy while reducing overall neuron operations by approximately 30% compared to a neuron-level refractory baseline under the same single-layer STDP configuration. This structural optimization directly contributes to lowering dynamic power consumption without compromising inference performance. Given its efficient control logic, low-latency routing, and suitability for SNN operation, the proposed architecture holds strong potential for deployment in energy-constrained edge AI environments. These characteristics make the architecture particularly well-suited for real-time, low-power neuromorphic inference in edge AI systems with constrained energy and computational budgets.
While the proposed architecture is optimized for single-layer, rate-coded online STDP learning, it also embodies an intentional design trade-off. The unified refractory mechanism prioritizes system-wide synchronization and computation efficiency, which naturally limits long-range temporal interactions that are less critical in the targeted learning model. Likewise, the centralized router aligns with the communication patterns and scale of compact edge deployments, where deterministic one-hop multicast provides clear benefits over multi-hop fabrics. Future research may extend this framework toward distributed or hierarchical variants of UREN and explore its applicability to deeper temporal SNN models.

Author Contributions

Conceptualization, S.-H.N. and D.-S.K.; methodology, S.-H.N. and D.-S.K.; investigation, S.-H.N.; data curation, S.-H.N.; writing—original draft preparation, S.-H.N.; writing—review and editing, S.-H.N. and D.-S.K.; supervision, D.-S.K.; project administration, D.-S.K.; funding acquisition, D.-S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the IITP (Institute of Information & Communications Technology Planning & Evaluation)—ITRC (Information Technology Research Center) grant funded by the Korean government (Ministry of Science and ICT) (IITP-2025-RS-2024-00438007).

Data Availability Statement

The data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Srinivasan, K.; Cowan, G. Subthreshold CMOS Implementation of the Izhikevich Neuron Model. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 27 May–1 June 2022; pp. 1062–1066. [Google Scholar] [CrossRef]
  2. Raymond, C.; Gutierrez, E. A low power and low area mixed-signal neuronal cell for spiking neural networks. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Lansing, MI, USA, 9–11 August 2021; pp. 313–316. [Google Scholar] [CrossRef]
  3. Yang, H.; Lam, K.Y.; Xiao, L.; Xiong, Z.; Hu, H.; Niyato, D.; Vincent Poor, H. Lead federated neuromorphic learning for wireless edge artificial intelligence. Nat. Commun. 2022, 13, 4269. [Google Scholar] [CrossRef] [PubMed]
  4. Frenkel, C.; Bol, D.; Indiveri, G. Bottom-up and top-down approaches for the design of neuromorphic processing systems: Tradeoffs and synergies between natural and artificial intelligence. Proc. IEEE 2023, 111, 623–652. [Google Scholar] [CrossRef]
  5. Moradi, S.; Qiao, N.; Stefanini, F.; Indiveri, G. A scalable multicore architecture with heterogeneous memory structures for dynamic neuromorphic asynchronous processors (DYNAPs). IEEE Trans. Biomed. Circuits Syst. 2018, 12, 106–122. [Google Scholar] [CrossRef] [PubMed]
  6. Zhu, Q.B.; Li, B.; Yang, D.D.; Liu, C.; Feng, S.; Chen, M.L.; Sun, D.M. A flexible ultrasensitive optoelectronic sensor array for neuromorphic vision systems. Nat. Commun. 2021, 12, 1798. [Google Scholar] [CrossRef]
  7. Aitsam, M.; Davies, S.; Nuovo, A.D. Neuromorphic Computing for Interactive Robotics: A Systematic Review. IEEE Access 2022, 10, 122261–122279. [Google Scholar] [CrossRef]
  8. Schuman, C.D.; Kulkarni, S.R.; Parsa, M.; Mitchell, J.P.; Date, P.; Kay, B. Opportunities for neuromorphic computing algorithms and applications. Nat. Comput. Sci. 2022, 2, 10–19. [Google Scholar] [CrossRef] [PubMed]
  9. Diehl, P.U.; Cook, M. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front. Comput. Neurosci. 2015, 9, 99. [Google Scholar] [CrossRef] [PubMed]
  10. Zhao, D.; Schrape, O.; Stamenkovic, Z.; Krstic, M. ImSTDP: Implicit Timing On-Chip STDP Learning. IEEE Trans. Circuits Syst. I Regul. Pap. 2025, 72, 868–881. [Google Scholar] [CrossRef]
  11. Eshraghian, J.K.; Ward, M.; Neftci, E.O.; Wang, X.; Lenz, G.; Dwivedi, G.; Lu, W.D. Training Spiking Neural Networks Using Lessons From Deep Learning. Proc. IEEE 2023, 111, 1016–1054. [Google Scholar] [CrossRef]
  12. Li, Z.; Lemaire, E.; Abderrahmane, N.; Bilavarn, S.; Miramond, B. Efficiency analysis of artificial vs. Spiking Neural Networks on FPGAs. J. Syst. Archit. 2022, 133, 102765. [Google Scholar] [CrossRef]
  13. CEPTNACLK. Neuromorphic NoC Architecture for SNNs. 2021. GitHub Repository. Available online: https://github.com/cepdnaclk/e18-4yp-Neuromorphic-NoC-Architecture-for-SNNs (accessed on 15 September 2025).
  14. Fang, H.; Shrestha, A.; Ma, D.; Qiu, Q. Scalable NoC-based Neuromorphic Hardware Learning and Inference. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
  15. Xue, J.; Xie, L.; Chen, F.; Wu, L.; Tian, Q.; Zhou, Y.; Ying, R.; Liu, P. EdgeMap: An Optimized Mapping Toolchain for Spiking Neural Network in Edge Computing. Sensors 2023, 23, 6548. [Google Scholar] [CrossRef] [PubMed]
  16. Wang, B.; Wong, M.M.; Li, D.; Chong, Y.S.; Zhou, J.; Wong, W.F.; Do, A.T. 1.7pJ/SOP Neuromorphic Processor with Integrated Partial Sum Routers for In-Network Computing. In Proceedings of the 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 21–25 May 2023; pp. 1–5. [Google Scholar] [CrossRef]
  17. Cakin, A.; Dilek, S.; Tosun, S. Energy-aware application mapping methods for mesh-based hybrid wireless network-on-chips. J. Supercomput. 2024, 80, 15582–15612. [Google Scholar] [CrossRef]
  18. Ma, D.; Shen, J.; Gu, Z.; Zhang, M.; Zhu, X.; Xu, X.; Xu, Q.; Shen, Y.; Pan, G. Darwin: A neuromorphic hardware co-processor based on spiking neural networks. J. Syst. Archit. 2017, 77, 43–51. [Google Scholar] [CrossRef]
  19. Nambiar, V.P.; Pu, J.; Lee, Y.K.; Mani, A.; Koh, E.K.; Wong, M.M.; Do, A.T. Energy Efficient 0.5V 4.8pJ/SOP 0.93µW Leakage/Core Neuromorphic Processor Design. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 3148–3152. [Google Scholar] [CrossRef]
  20. Kim, J.; Park, J.; Joo, S.; Jung, S.O. Efficient Hardware Implementation of STDP for AER Based Large-Scale SNN Neuromorphic System. In Proceedings of the 2020 35th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), Nagoya, Japan, 3–6 July 2020; pp. 1–4. [Google Scholar]
  21. Nain, Z.; Ali, R.; Anjum, S.; Afzal, M.K.; Kim, S.W. A Network Adaptive Fault-Tolerant Routing Algorithm for Demanding Latency and Throughput Applications of Network-on-a-Chip Designs. Electronics 2020, 9, 1076. [Google Scholar] [CrossRef]
  22. Cassidy, A.; Alvarez-Icaza, R.; Akopyan, F.; Sawada, J.; Arthur, J.; Merolla, P.; Datta, P.; Gonzalez Tallada, M.; Taba, B.; Andreopoulos, A.; et al. Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with 100× Speedup in Time-to-Solution and 100,000× Reduction in Energy-to-Solution. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), New Orleans, LA, USA, 16–21 November 2014. [Google Scholar] [CrossRef]
  23. Davies, M.; Srinivasa, N.; Lin, T.H.; Chinya, G.; Cao, Y.; Choday, S.H.; Wang, H. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro 2018, 38, 82–99. [Google Scholar] [CrossRef]
  24. Akopyan, F.; Sawada, J.; Cassidy, A.; Alvarez-Icaza, R.; Arthur, J.; Merolla, P.; Imam, N.; Nakamura, Y.; Datta, P.; Nam, G.; et al. TrueNorth: Design and tool flow of a 65 mW 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2015, 34, 1537–1557. [Google Scholar] [CrossRef]
  25. Yerima, W.Y.; Ikechukwu, O.M.; Dang, K.N.; Abdallah, A.B. Fault-Tolerant Spiking Neural Network Mapping Algorithm and Architecture to 3D-NoC-Based Neuromorphic Systems. IEEE Access 2023, 11, 52429–52443. [Google Scholar] [CrossRef]
  26. Frenkel, C.; Legat, J.D.; Bol, D. Morphic: A 65-nm 738 Ksynapse/mm2 quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning. IEEE Trans. Biomed. Circuits Syst. 2019, 13, 999–1010. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overview of the proposed star-topology-based NoC architecture for multicasting spike packets.
Figure 1. Overview of the proposed star-topology-based NoC architecture for multicasting spike packets.
Electronics 14 04959 g001
Figure 2. Neuron Feature Extraction Process Using WTA and Lateral Inhibition.
Figure 2. Neuron Feature Extraction Process Using WTA and Lateral Inhibition.
Electronics 14 04959 g002
Figure 3. Baseline neuron-level refractory vs. proposed unified refractory (UREN). When the winner fires at t fire , the router asserts a global mask R [ t ] = 1 for T ref steps, keeping all neurons at U reset and skipping integration. This avoids useless updates of non-firing neurons and prevents aggressive weight growth during dense activations.
Figure 3. Baseline neuron-level refractory vs. proposed unified refractory (UREN). When the winner fires at t fire , the router asserts a global mask R [ t ] = 1 for T ref steps, keeping all neurons at U reset and skipping integration. This avoids useless updates of non-firing neurons and prevents aggressive weight growth during dense activations.
Electronics 14 04959 g003
Figure 4. Comparison between all-to-all STDP and nearest-neighbor STDP learning methods.
Figure 4. Comparison between all-to-all STDP and nearest-neighbor STDP learning methods.
Electronics 14 04959 g004
Figure 5. Overall system architecture integrating neuron cores and the unified refractory-enabled router (UREN). The router multicasts spike packets, manages global refractory masking, and synchronizes neuron operations across all cores. By coordinating clock gating and packet arbitration, the system minimizes redundant computation and power consumption for edge-AI neuromorphic applications.
Figure 5. Overall system architecture integrating neuron cores and the unified refractory-enabled router (UREN). The router multicasts spike packets, manages global refractory masking, and synchronizes neuron operations across all cores. By coordinating clock gating and packet arbitration, the system minimizes redundant computation and power consumption for edge-AI neuromorphic applications.
Electronics 14 04959 g005
Figure 6. Network Interface (NI) and spike-packet format. The NI bridges the router and neuron cores via handshake I/F and buffering; packets carry mode, core-mask, time-step, batch-index, and spike data to support both learning and inference. It also returns the winner’s labeled ID to keep the unified refractory control synchronized across the network.
Figure 6. Network Interface (NI) and spike-packet format. The NI bridges the router and neuron cores via handshake I/F and buffering; packets carry mode, core-mask, time-step, batch-index, and spike data to support both learning and inference. It also returns the winner’s labeled ID to keep the unified refractory control synchronized across the network.
Electronics 14 04959 g006
Figure 7. Neuron-core block diagram implementing LIF computation with 512 time-division-multiplexed (TDM) neurons. Incoming spike indices are translated by the addressing crossbar to access weight/neuronal SRAMs; the learning module and lateral-inhibition block coordinate updates and competition under the NI/FSM control. This modular design enables compact, energy-efficient on-chip SNN learning.
Figure 7. Neuron-core block diagram implementing LIF computation with 512 time-division-multiplexed (TDM) neurons. Incoming spike indices are translated by the addressing crossbar to access weight/neuronal SRAMs; the learning module and lateral-inhibition block coordinate updates and competition under the NI/FSM control. This modular design enables compact, energy-efficient on-chip SNN learning.
Electronics 14 04959 g007
Figure 8. Hardware of the nearest-neighbor STDP. Pre-synaptic history is stored in shift registers, and upon a post-synaptic spike, the most recent pre-synaptic spike is selected to compute Δ t . A LUT maps Δ t to Δ w , which is bounded and normalized before being written back to the weight SRAM, eliminating the need to track all spike pairs.
Figure 8. Hardware of the nearest-neighbor STDP. Pre-synaptic history is stored in shift registers, and upon a post-synaptic spike, the most recent pre-synaptic spike is selected to compute Δ t . A LUT maps Δ t to Δ w , which is bounded and normalized before being written back to the weight SRAM, eliminating the need to track all spike pairs.
Electronics 14 04959 g008
Figure 9. (a) Spike rate and neuron operation comparison under different refractory control schemes; (b) scalability of the proposed UREN-based processor.
Figure 9. (a) Spike rate and neuron operation comparison under different refractory control schemes; (b) scalability of the proposed UREN-based processor.
Electronics 14 04959 g009
Figure 10. Classification accuracy across datasets with increasing neuron counts.
Figure 10. Classification accuracy across datasets with increasing neuron counts.
Electronics 14 04959 g010
Figure 11. (a) Comparison of software (SW) and hardware (HW) learning runtime (b) histogram of squared weight error between MATLAB and FPGA results.
Figure 11. (a) Comparison of software (SW) and hardware (HW) learning runtime (b) histogram of squared weight error between MATLAB and FPGA results.
Electronics 14 04959 g011
Figure 12. Chip layout and key specifications.
Figure 12. Chip layout and key specifications.
Electronics 14 04959 g012
Table 1. Simulation Parameters. # denotes the number of neurons.
Table 1. Simulation Parameters. # denotes the number of neurons.
ParameterValue
# Input Neurons256
# Output Neurons4096
Time Step350
Initial Weight Range[0, 0.1]
W min / W max 0/1.5
λ (leak rate)0.2
η (learning rate)0.01
A + / A 0.8/0.3
τ + / τ 8/5
V reset 0
V inhibit −30
V hyper −20
Refractory Time15
θ (threshold)15
Table 2. FPGA Resource and Power Summary on VMK180 Evaluation Board.
Table 2. FPGA Resource and Power Summary on VMK180 Evaluation Board.
Resource/ParameterUtilizationAvailableUtilization (%)Remarks
LUT68,5511,143,0006.0Logic implementation
FF55,2012,286,0002.4Sequential elements
BRAM (18 k)42160<0.1On-chip buffering
URAM (36 k)329603.3Weight storage
Dynamic Power3.319 WClocks 6%, Logic 4%, I/O 37%
Worst Negative Slack (Setup)0.367 nsTiming met
Total Negative Slack (Hold)0.267 nsTiming met
Pulse Width Slack0.003 nsAll constraints met
Table 3. Comparison with Prior Neuromorphic Architectures.
Table 3. Comparison with Prior Neuromorphic Architectures.
ReferenceAkopyan et al. [24]Yerima et al. [25]Frenkel et al. [26]This Work
Cores4096644max. 8
Neurons/core256256512512
Synaptic Precision1-bit8-bit1-bit8-bit
NoC Topology2D Mesh3D MeshHierarchicalStar Topology (UREN Router)
Max. Hop Delay4 hops4 hops1 hop
Learning RuleSTDPS-SDSPNearest-STDP
Refractory ControlNoLocal RefractoryNoUnified Refractory
MNIST Accuracy (%)99.42 (offline)79.4/84.5 (online)97.8 (offline)86.1 (online)
Memory/core64 KB64 KB + 8 KB128 KB + 3 KB
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Na, S.-H.; Kim, D.-S. An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI. Electronics 2025, 14, 4959. https://doi.org/10.3390/electronics14244959

AMA Style

Na S-H, Kim D-S. An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI. Electronics. 2025; 14(24):4959. https://doi.org/10.3390/electronics14244959

Chicago/Turabian Style

Na, Su-Hwan, and Dong-Sun Kim. 2025. "An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI" Electronics 14, no. 24: 4959. https://doi.org/10.3390/electronics14244959

APA Style

Na, S.-H., & Kim, D.-S. (2025). An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI. Electronics, 14(24), 4959. https://doi.org/10.3390/electronics14244959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop