Tolerating Permanent Faults in the Input Port of the Network on Chip Router

Deep submicron technologies continue to develop according to Moore’s law allowing hundreds of processing elements and memory modules to be integrated on a single chip forming multi/many-processor systems-on-chip (MPSoCs). Network on chip (NoC) arose as an interconnection for this large number of processing modules. However, the aggressive scaling of transistors makes NoC more vulnerable to both permanent and transient faults. Permanent faults persistently affect the circuit functionality from the time of their occurrence. The router represents the heart of the NoC. Thus, this research focuses on tolerating permanent faults in the router’s input buffer component, particularly the virtual channel state fields. These fields track packets from the moment they enter the input component until they leave to the next router. The hardware redundancy approach is used to tolerate the faults in these fields due to their crucial role in managing the router operation. A built-in self-test logic is integrated into the input port to periodically detect permanent faults without interrupting router operation. These approaches make the NoC router more reliable than the unprotected NoC router with a maximum of 17% and 16% area and power overheads, respectively. In addition, the hardware redundancy approach preserves the network performance in the presence of a single fault by avoiding the virtual channel closure.


Introduction
The advancements in the semiconductor industry towards deep-sub micrometer technologies were motivated by Moore's law in the last decades, allowing the integration of billions of transistors on a single chip [1,2]. These chips embed hundreds of memory modules and functional intellectual property (IP) blocks forming multi/many-processor systems-on-chip (MPSoCs) [2,3]. Increasing processing elements numbers in a single chip presents complexity in interconnection architecture making traditional bus-based interconnections unsuitable in MPSoCs [4]. Therefore, network on chip (NoC) interconnection architecture appeared as an alternative on-chip interconnection in MPSoCs due to its high performance, reusability, scalability, and fault tolerance characteristics [1, 4,5]. Figure 1 shows the MPSoCs scheme interconnected by NoC.
NoC [6][7][8][9] interconnection consists of a group of shared router nodes joined by shared channels or links, making this structure more efficient than buses. The connections between the routers and their attached processing elements (PEs) are constructed by network interfaces (NIs) [10]. The functionality of an NoC is defined by its topology, routing algorithm, flow control, and switching technique. Switching techniques are used to determine data format to traverse its path, such as the wormhole packet switching technique, which is the most widely adopted technique. Wormhole switching divides By going deeper into the sub micrometer technology and the continuous device downscaling, the on-chip interconnects and routers became more susceptible to faults. There are two different kinds of NoC potential reliability challenges, soft (transient) and hard (permanent) faults. Soft errors, which are unpredictable errors that occur for a short time and are caused by different sources, such as energetic particle strikes [13], electrical noise [14], or process variation [15], may affect the NoC by misrouting packets or making them invalid. Hard faults, which are permanent, persistently affect the circuit functionality from the time of their initiation and are caused by many sources, such as electromigration [16], and time-dependent dielectric breakdown [17].
Many researchers proposed fault tolerance techniques to improve NoC reliability, which is the network probability of delivering messages correctly in a certain time [9]. The faults may occur in some or all components of NoC, namely links and routers. Fault tolerance techniques are achieved mainly by redundancy, which can be time redundancy, such as data retransmission with the selfsame component, information redundancy by adding code for error correction, or spatial redundancy achieved by redundant elements [18]. The soft errors can be countered mainly by error control coding whereas the hard faults can be countered by spare module/gate for replacements, faulty part isolation, or fault tolerant routing [10].
The main part of NoC is the router, which carries out the essential task of steering and coordinating the data flow. It consists of three major parts: input buffers element, routing components, and crossbar component. The flits are stored in the input buffer component and routed by routing components, which are the route computation component, virtual channel allocation component, and switch allocation component. The crossbar component is used to transfer flit to destination node physically [10]. Some works, such as [19][20][21][22][23], addressed the permanent faults in the router by addressing the different components.
The proposed scheme tolerates permanent (hard) faults in very important fields of the input buffer (port) component called the virtual channel state fields, which they track the flits the moment the buffer is reserved until it is released. To the best of our knowledge, these fields have not been considered yet. The proposed scheme uses the hardware redundancy technique for tackling permanent faults in these fields. Furthermore, a built-in-self-test circuitry is integrated into the input port to detect the faults and configure the hardware redundancy circuit accordingly.
The rest of the paper is structured as follows: Section 2 presents related works addressing permanent faults in the NoC router. Section 3 describes the architecture of the baseline 4-stage NoC router. Section 4 presents the proposed design that tolerates permanent faults in the input buffer component of the NoC router. Section 5 discusses the results of the proposed design in terms of delay, area overhead, power consumption, reliability, and network performance. Section 6 concludes this paper.

Related Works
In this section, we review earlier works on fault-tolerant NoC router architectures that focus on tolerating permanent faults in the NoC router pipeline.
N-modular redundancy (NMR) approaches were adopted in the BulletProof router [19] to provide fault tolerance from permanent faults. However, NMR techniques significantly increase the area and power consumption because they require redundant reproductions of hardware. Vicis methodology [20] was suggested to tolerate permanent faults using inherent redundancy at two levels, network and router. At the network level, Vicis approach employs input port swapping supported by an adaptive routing algorithm to tolerate permanent faults. At the router level, it tolerates faults in the crossbar of the router using a bypass bus and in the datapath of the router using error-correcting codes (ECC). Although Vicis router provides better fault tolerance with lesser area overhead than BulletProof [19], it causes a reduction in performance with increasing faults. Furthermore, it can sustain at most one permanent fault at the buffer component.
In [21], the Row-Column (RoCo) Decoupled router was proposed, which divides the router into individual row and column modules. The division process was enabled by using decoupled parallel arbiters and smaller crossbars for row and column connections. Because the row and column modules are independent in their work, a permanent fault in one module does not affect the other module, and the router operates correctly with the fault-free module. The RoCo router tolerates permanent faults in the buffers via a bypass path, in the routing computation stage by using a double route computation in the next router, and in the switch allocation stage by sharing arbiters from virtual channel allocators. Fault tolerance property is not applied for the virtual channel allocation and the crossbar stages. This decomposition strategy is energy-efficient and reliable. However, its recovery designs for faulty modules cause latency penalties and performance degradations.
In [22], the researchers suggested a permanent fault tolerant router (PFTR) and further extended in the Shield router [24], which has the ability to tolerate multiple permanent faults in the pipeline stages. Its reliability is accomplished by adding minimal extra circuitry and exploiting temporal parallelism to individual stages of the baseline router pipeline. The PFTR tolerates permanent faults for the routing computation (RC) stage by using a redundant RC component for each input port, for the virtual channel allocation (VA) stage by using resource sharing, for the switch allocation (SA) stage via bypass path for each arbiter, and for the crossbar (XB) stage by providing two paths for each output port of the crossbar implemented by adding smaller sized decoders of the crossbar. The PFTR module provides better reliability than a baseline router and other fault-tolerant routers. However, the technique of using the idle time of existing fault-free resources degrades performance under high traffic. Furthermore, the PFTR scheme cannot tolerate faults in the input port.
A reliable NoC router based on a 2-stage generic router was presented in [23].

Baseline 4-Stage NoC Router
The baseline router composes of 4-stages pipeline (RC, VA, SA, and switch traversal ST) each one takes one clock cycle to be executed. In this section, the background of a generic 4-stage NoC router is illustrated. Figure 2 shows the architecture of a baseline 4-stage NoC router [9]. The router has five input ports and five output ports, each input port has 1: V demultiplexer, V:1 multiplexer, and V virtual channels (VC's). The control components of the router are the routing computation (RC) component, the virtual channel allocation (VA) component, the switch allocation (SA) component, and the crossbar (XB) which connects the input ports to the output ports of the router [23]. Since the baseline router uses wormhole packet switching technique, the packet is divided into three types of smaller fixed size flits namely a head flit which allocates router resources to the packet, a body flit which contains the packet payload, and a tail flit which de-allocates router resources from the particular packet [22]. Each packet starts with a head flit followed by zero or more body flits and a tail flit. In single flit packets, the head flit is also marked as a tail flit. Flit types are shown in Figure 3.    Figure 2 shows the architecture of a baseline 4-stage NoC router [9]. The router has five input 137 ports and five output ports, each input port has 1: V demultiplexer, V:1 multiplexer, and V virtual

160
In a baseline router, each input port is decomposed into groups of flit buffers. These buffers are 161 organized as a compound of fixed size queues as shown in Figure 2. Each queue is named a virtual 162 channel, and all virtual channels for the same input port share a physical channel of that port [23].

163
Therefore, each new arriving flit is stored in a specific VC buffer selected by VC identifier of that flit.  Figure 4 shows the pipeline of the baseline 4-stage NoC router. A packet is processed through the pipeline of the NoC router to be transferred to the next router or to the desired processing element (PE) of that router. The RC and VA stages are called (once per packet) because they perform computation for head flit of each packet. On the other hand, the SA and XB stages are called (once per flit) because they perform computation for all flits (head, body, and tail) of the packet [7]. During the processing of flits in each router component, the virtual channel state fields (G, R, O, P, and C) are changed according to that component as shown in Figure 4.

Route Info
Payload

160
In a baseline router, each input port is decomposed into groups of flit buffers. These buffers are 161 organized as a compound of fixed size queues as shown in Figure 2. Each queue is named a virtual 162 channel, and all virtual channels for the same input port share a physical channel of that port [23].

163
Therefore, each new arriving flit is stored in a specific VC buffer selected by VC identifier of that flit.

Input Buffer Component (Input Port Component)
In a baseline router, each input port is decomposed into groups of flit buffers. These buffers are organized as a compound of fixed size queues as shown in Figure 2. Each queue is named a virtual channel, and all virtual channels for the same input port share a physical channel of that port [23]. Therefore, each new arriving flit is stored in a specific VC buffer selected by VC identifier of that flit. All flits of the same packet are stored in the same virtual channel. Each input virtual channel is tracked by five virtual channel state vectors or fields called Global state (G), Route (R), Output VC (O), Pointers (P) and Credit count (C) fields [9]. The G field represents the status of the current virtual channel and takes one of the following states, either idle (I), routing (R), waiting for an output VC (V), active (A), or waiting for credits (C). The R field stores the output port for the packet and is set after the routing computation stage is completed. The O field holds the output VC of port R assigned to the current input VC, which is the result of the virtual channel allocation stage. The P field stores the head flit and tail flit pointers that point to the buffers in the input VC as shown in Figure 5. The C field indicates the credits count of available in downstream flit buffers for the output virtual channel O on output port R. Table 1 displays the function of each state field and a summary of the possible effects of faulty fields, which shows the importance of these fields to the flow of messages in the NoC router.
channel O on output port R. Table 1   Represents the status of the current virtual channel: idle (I), routing (R), waiting for an output VC (V), active (A), or waiting for credits (C).
The state of VC is changed. Many effects can occur but in general, the pipelined operation will not proceed correctly or may stop.

R (Route)
Stores the output port for the packet and is set after the routing computation stage is completed.
The packet is misrouted.
Holds the output VC of port R assigned to the current input VC which is the result of the virtual channel allocation stage.
The wrong O leads to wrong output VC. Packet(s) may be lost.

P (Pointers)
Stores the head flit and tail flit pointers that point to the buffers in the input VC.
The faulted P leads to pointing on the wrong location of buffers. Flits or even packet can be lost, and VC may stay unreleased.

C (Credit count)
Indicates the credits count of available downstream flit buffers for output virtual channel O on output port R.
If C is wrong, then the number of free buffers in output VC is wrong. Flits may be forwarded to a full buffer or flits may stay while the buffer is available.     It is controlled by the control signals generated by the switch allocator (SA) [22].  Represents the status of the current virtual channel: idle (I), routing (R), waiting for an output VC (V), active (A), or waiting for credits (C).

Routing Computation
The state of VC is changed. Many effects can occur but in general, the pipelined operation will not proceed correctly or may stop.

R (Route)
Stores the output port for the packet and is set after the routing computation stage is completed. The packet is misrouted.

O (Output VC)
Holds the output VC of port R assigned to the current input VC which is the result of the virtual channel allocation stage.
The wrong O leads to wrong output VC. Packet(s) may be lost.

P (Pointers)
Stores the head flit and tail flit pointers that point to the buffers in the input VC.
The faulted P leads to pointing on the wrong location of buffers. Flits or even packet can be lost, and VC may stay unreleased.

C (Credit count)
Indicates the credits count of available downstream flit buffers for output virtual channel O on output port R.
If C is wrong, then the number of free buffers in output VC is wrong. Flits may be forwarded to a full buffer or flits may stay while the buffer is available.

Routing Computation Logic Component (RC)
In a baseline router, all input ports have one shared routing computation (RC) component, which computes the output port that the packet should be headed to. This decision is taken according to the destination information that exists in the head flit of that packet and on the routing algorithm used. The output port number is stored in the Route (R) field of the virtual channel status fields. The RC component operates only on the head flit of a packet [22]

Switch Allocation Component (SA)
The switch allocation (SA) component determines which input VC of the input port can transfer its flits through the crossbar component to the output port in the following cycle [22]. It operates on all flits and updates the (C) field when successful.

Switch Traversal Component (Crossbar XB) or (ST)
The crossbar (XB) component connects the input ports with the output ports of the same router. It is controlled by the control signals generated by the switch allocator (SA) [22].

Alternative Router Pipelines
To reduce the number of cycles in the pipeline, the stages may work in parallel in a single cycle with speculation technique. A pipeline with 3-stages is designed by speculatively performing SA in parallel with VA. This can be further enhanced to a 2-stage pipeline where the flit is sent to the ST stage while the VA and SA are being executed at the same cycle. One stage pipeline is designed by combining the 3-stages are speculation (VA, SA, and ST) with the lookahead RC property, where the next output port is computed in the upstream router and sent with the head flit. These alternatives do not affect the VC state fields, thus, keeping their role crucial in all cases and the proposed protection solution can be incorporated in these pipelines as well. As a result, this work will consider the 4-stage router.

Proposed NoC Router
In this section, the proposed router is illustrated, where the focus is on tolerating permanent faults in the state fields of the input buffer component to increase its reliability. Other components or stages were tackled from faults to increase reliability by many researchers [20][21][22][23] and can be adopted to provide complete protection. The proposed buffer component employs the hardware redundancy technique to tolerate permanent faults in the input virtual channel state fields (G, R, O, P, and C). In addition to this hardware redundancy technique, the proposed buffer component provides a VC-closing technique and a detection mechanism. The latter is responsible for the detection of faults and controls the hardware redundancy and VC-closing accordingly.

Detection Mechanism
The proposed detection circuitry represents a built-in-self-test (BIST) mechanism that detects faults, identifies their location, and accordingly provides a decision applied for the virtual channel state fields and for the spare registers. The BIST mechanism aims to detect stuck at 0 and stuck at 1 faults which are the most widely used fault models, then writes the result in the status registers; a 4-bit register State1 for (G, R, O, and C) fields and a 2-bit register State2 for P field (head and tail pointers). Figure 6 shows the operation flow chart of the detection circuitry. The stuck at 1 fault test is first done by applying all zeros value to the testing fields and then checking the output of these fields at the next clock cycle. If the output is all zeros it indicates that there is no stuck at 1 fault found, otherwise the found fault is marked in the corresponding bits of status registers. Stuck at 0 fault test is similarly carried out by applying all ones and looking for faults represented by a zero at the output at the next clock cycle. The same registers are updated according to the test result as well. This mechanism generates the main outputs (test, State1, State2, and vc_closing_out) to be used by the proposed hardware redundancy that will be discussed in the next subsection. The first output (test signal) goes high to start the BIST process when (reset = 1) which means the router operation has started or (G = 3 and vc_release = 1) which means that the packet has left the VC. Otherwise, the test signal goes low, and the normal router operation is resumed. State1 and State2 registers identify the faulted input state field in the proposed hardware redundancy technique. The vc_closing_out signal goes high to close the input virtual channel when it the number of detected faults exceeds the maximum number of tolerated faults in the proposed fault tolerance technique.
The BIST mechanism requires three clock cycles to be completed. In the first cycle, all input VC state fields are set to zero but their outputs will be visible in the next cycle. In the second cycle, the state fields outputs are checked for any stuck-at-1 faults, and their inputs are set to one. In the last cycle, the new state fields outputs are checked for stuck-at-0 faults, and their inputs are set to zero to prepare them for the normal operation in the next cycle. Each input VC has an idle time of ten clock cycles between its release until its next reservation as shown in Figure 4. At this idle time the downstream router waits for credits indicating that the upstream router input VC is free. Therefore, the BIST exploits this idle time to perform a periodic fault detection to avoid interrupting the router operation. For a single stage pipeline with one cycle link traversal, the idle time of each input VC is four clock cycles (from 7 to 10) between its release until its next reservation as shown in Figure 7, which is still enough to accommodate the proposed BIST. four clock cycles (from 7 to 10) between its release until its next reservation as shown in Figure 7, 244 which is still enough to accommodate the proposed BIST.

Hardware Redundancy Technique
The VC status fields are split into two groups according to their role. The first group includes (G, R, O, C) which represent the control of the VC. The second group includes the P field, encompassing the head pointer (PH) and tail pointer (PT) which control the buffer read and write operations, respectively. The hardware redundancy technique is realized by adding two spare registers, Sp1 and Sp2. The former is used as a spare for the first group to replace one of the state fields (G, R, O, or C) when one is faulty. The size of Sp1 depends on the maximum size of state fields (G, R, O, or C). The Sp2 register represents the spare register for the second group, replacing PH or PT when one is faulty. The size of Sp2 depends on the number of buffers in the VC (size of head and tail pointers). The hardware redundancy technique can sustain a single fault in each group. The general scheme of the proposed hardware redundancy circuit, shown in Figure 8, depends on the outputs of the BIST circuit, State1 and State2, which are used to indicate the faulty field that should be replaced with Sp1 and Sp2, respectively. Figure 9 shows the internal design of the BIST and Group2. The Baseline Router VC FSM is the unaltered hardware that changes G, R, O, P, and C according to the VC status and generates the new (next) values of G, R, O, P, and C. The BIST Controller starts the test operation when (reset = 0) or (G = 3 and vc_release = 1) and controls the Pattern Generator, Faults Detection, and States.

VC-Closing Property
When the number of faults in the VC status fields exceeds the hardware redundancy fault tolerance capability, the VC will malfunction. This occurs when more than one fault affects the same group of VC status fields in the VC. As a result, VC-closing technique is used to close this faulty VC and inform the upstream router to avoid allocating packets to this VC. This is realized by adding a new signal for each VC in each input port (VC_closing_out signal) which goes out from the input port of the downstream router to the output port of the upstream router to prevent the latter from allocating this faulty VC during the VA stage, as shown in Figure 10. This technique preserves packets transfer through an input port in the event of VC failures until all VCs of that input port fail.

Results and Discussion
Our reliable router is compared with the baseline router from delay, area, power, and reliability perspectives. Both routers are developed in Verilog HDL, functionally verified using ModelSim, and then synthesized using Synopsys Design Compiler with 45 nm Nangate's Open Cell library. All analyses were conducted on a router with five input ports and five output ports. Different numbers of virtual channels were considered, namely 2 VCs, 4 VCs, and 8 VCs, with four flits per VC and 32-bit flit size. Figure 11 shows the delay of our reliable router compared to the baseline router for 2 VCs, 4 VCs, and 8 VCs with four flits per VC for each case. The delay is increased by 23%, 9%, and 11% for 2, 4, and 8 VCs, respectively, when our proposed fault detection and tolerance circuitries are added. The critical path delay results were found by repeating the synthesis process while changing the clock period until zero slack is obtained. The virtual channel allocation component in both the baseline and the proposed design represents the critical path, which takes the input virtual channel state fields as input to reserve the output virtual channel and saves the result in the input virtual channel state field O. Due to the additional multiplexers and demultiplexers and their corresponding selection signals the critical path has increased in the proposed design.   the critical path has increased in the proposed design. Figure 11. The delay of our reliable router compared to the baseline router for 2VCs, 4VCs, and Figure 11. The delay of our reliable router compared to the baseline router for 2VCs, 4VCs, and 8VCs. Figure 12 shows the area overhead of our reliable router compared to the baseline router for 2 VCs, 4 VCs, and 8 VCs at a clock frequency of 426 MHz, 339 MHz, and 222 MHz, respectively. The area of baseline router for 2 VCs, 4 VCs, and 8 VCs is increased by 17%, 17%, and 15%, respectively, when our proposed fault detection and tolerance circuitries are added. Figure 13 shows how the area is partitioned among the different router parts for the baseline and the spare routers for 4 VC case. It is clear how the input port consumes the highest portion of the router area. It can be seen that the input ports are affected by the highest amount of area overhead since they are the target of the proposed design. The output ports also suffer from slight area overhead due to the VC_closing technique. It is also important to indicate that the VC state fields in the baseline router consume an area of 1490 µm 2 , which is higher than that of the RC, SA, and ST components. Figure 12 shows the area overhead of our reliable router compared to the baseline router for 2 298 VCs, 4 VCs, and 8 VCs at a clock frequency of 426 MHz, 339 MHz, and 222 MHz, respectively. The 299 area of baseline router for 2 VCs, 4 VCs, and 8 VCs is increased by 17%, 17%, and 15%, respectively, 300 when our proposed fault detection and tolerance circuitries are added. Figure 13 shows how the area 301 is partitioned among the different router parts for the baseline and the spare routers for 4 VC case. It 302 is clear how the input port consumes the highest portion of the router area. It can be seen that the 303 input ports are affected by the highest amount of area overhead since they are the target of the 304 proposed design. The output ports also suffer from slight area overhead due to the VC_closing 305 technique. It is also important to indicate that the VC state fields in the baseline router consume an 306 area of 1490 µm 2 , which is higher than that of the RC, SA, and ST components.        Figure 14 shows the dynamic power consumption of our reliable router compared to the baseline router for 2 VCs, 4 VCs, and 8 VCs at a clock frequency of 426 MHz, 339 MHz, and 222 MHz, respectively. The static power is neglected, as it constitutes less than 1% of the total power consumption.

Power Analysis
The dynamic power consumption of the baseline router for 2 VCs, 4 VCs, and 8 VCs are increased by 16%, 11%, and 10%, respectively, due to the fault tolerance circuit and the BIST circuit.

347
The mean number of faults to cause failure is 15 ([2 + 28]/2) and the normalized area is 1.31, so the

Reliability Improvement
The silicon protection factor (SPF) metric is used to assess the reliability of our proposed reliable NoC router as compared to the baseline router and other state-of-the-art reliable routers. SPF is defined as the ratio of the mean number of faults required to cause a failure and the area overhead acquired due to the fault tolerance circuitry [19]. , and the fourth VC is active and having two faults. The SPF of the baseline is one in all cases because the minimum number of faults that made failure in the baseline router equals one, the maximum number of faults that can be tolerated in each stage equals zero, and the total area is one. So, from Equations (1) and (2), it can be inferred that the SPF of the baseline router for any number of VCs is one. As a result, the total number of faults for a single port is 11, and for all ports of the 5-ports router, the total maximum number of tolerated faults is 55 faults. Substituting the two values (min = 1 and max = 55) in Equations (1) and (2) infers that the SPF of the proposed router for 4 VCs with area overhead of 0.17 is 24. Similarly, the SPF for 2 VCs and 8 VCs equals 12 and 51, respectively. Figure 15 shows the SPF for our proposed router compared with the baseline router for 2 VCs, 4 VCs, and 8 VCs. The results indicate that for our proposed design the SPF increases when the number of VCs increases. Table 2 shows the comparison of area overhead and SPF for Poluri [24] and Wang [23], and our proposed router. Poluri's [24] SPF resulted from the minimum faults that cause failure in stages RC, VA, SA, or ST namely (2,4,2,2), and the maximum tolerated faults in all these stages (5 + 15 + 5 + 2 356 Figure 15. The Silicon Protection Factor for our proposed router compared with the baseline router

24.35
In addition to the SPF factor, we estimate the lifetime enhancement achieved by the proposed  Rather than suggesting protecting only the VC state fields, this work aims to highlight their importance and the possible reliability enhancement when protected. In addition, the proposed input port fault tolerance can be integrated into the other reliable routers to realize a complete, reliable router.
In addition to the SPF factor, we estimate the lifetime enhancement achieved by the proposed input port component compared to the input port component of the baseline 4-stage NoC router using the mean time to Ffailure (MTTF). The main equation of MTTF is: where 1, 2, 3 are the components of the system; FIT is the number of failures in billion hours. The MTTF of the input port component of the baseline router can be given by: MTTF baseline input port = 10 9 FIT buffer + FIT Group1 + FIT Group2 .
where FIT bu f f er is failure in time of the buffer component; FIT Group1 is failure in time of Group1 of VC state fields, and FIT Group2 is failure in time of Group2 VC state fields. The MTTF of the input port component of the reliable router is: MTTF reliable input bu f f er component = 10 9 FIT buffer + 1 MTTF reliable Group1 + 1 MTTF reliable Group2 (5) Each group in the proposed input port can work normally if the baseline group or the corresponding correction circuitry are fault-free. Accordingly, the MTTF of each group can be given by [23,24]: MTTF reliable Group1 = 10 9 FIT Group1 + 10 9 FIT correction Group1 + 10 9 FIT Group1 + FIT correctionGroup1 (6) MTTF reliable Group2 = 10 9 FIT Group2 + 10 9 FIT correction Group2 + 10 9 FIT Group2 + FIT correction Group2 (7) The FIT values of the fundamental components due to time-dependent dielectric breakdown (TDDB) can be found in [23,24]. Table 3 lists the fundamental components (FC), their corresponding FIT values as given in [23,24], total number of each FC, and the total FIT values in the input port component. Similarly, Tables 4 and 5 present the computed values for Group1 correction circuitry and Group2 correction circuitry, respectively. The FIT of each group correction circuitry is the sum of the FIT of the spare bits and state bits. Substituting the values of Table 3 into Equation (4), the value of MTTF of baseline input port component is found to be (699301) hours. From the values of the three tables (Tables 3-5), we substituted these values in Equations (5)-(7) the MTTF of the reliable input port component is found to be 781,250 h, which is higher than that of the baseline input port.

Latency Analysis
We considered an 8 × 8 mesh topology and each packet is composed of four flits with 32-bit each. A uniform traffic pattern with Bernoulli injection is used in the simulations. Figure 16 compares three cases: first case is when all VCs are working, the second case is when one VC is closed in the north input port component of the router (y = 4, x = 3), and the last case is when one VC is closed in the west input port component of the router (y = 3, x = 4). It can be seen that the performance degrades when one VC in one router out of 64 routers is closed for both west and north VC closed cases. The west port has higher performance degradation because in XY routing strategy the west port may send packets to any of the other ports whereas the north port may send to the south and local ports only. This gives an interesting result that requires further analysis, which is out of the scope of this work. Based on the results in the figure it is clear the importance of avoiding the closure of any VC. This was resolved in the proposed design by adding the spare registers (Sp1, Sp2) which allow the router to continue working at full performance in spite of the occurrence of a single fault in each group (Group1, Group2), and the proposed scheme closes a VC only when two faults affect the same group.

Conclusions
A reliable input buffer component in NoC router is proposed in this work. The proposed solution includes fault detection and fault tolerance techniques to tolerate permanent faults in critical fields of the input buffer component called G, R, O, P, and C fields which control the work of virtual channels. Hardware redundancy technique was used to enhance the router reliability with low area, power consumption, and delay overheads with respect to the baseline router. In addition, the proposed router achieved 11% higher reliability than other fault tolerant router designs, represented by the silicon protection factor metric. The protection by hardware redundancy has an important effect on preserving the network performance since one VC closure showed a clear effect on the average packet latency.