A Network Adaptive Fault-Tolerant Routing Algorithm for Demanding Latency and Throughput Applications of Network-On-A-Chip Designs

: Scalability is a signiﬁcant issue in system-on-a-chip architectures because of the rapid increase in numerous on-chip resources. Moreover, hybrid processing elements demand diverse communication requirements, which system-on-a-chip architectures are unable to handle gracefully. Network-on-a-chip architectures have been proposed to address the scalability, contention, reusability, and congestion-related problems of current system-on-a-chip architectures. The reliability appears to be a challenging aspect of network-on-a-chip architectures because of the physical faults introduced in post-manufacturing processes. Therefore, to overcome such failures in network-on-a-chip architectures, fault-tolerant routing is critical. In this article, a network adaptive fault-tolerant routing algorithm is proposed, where the proposed algorithm enhances an e ﬃ cient dynamic and adaptive routing algorithm. The proposed algorithm avoids livelocks because of its ability to select an alternate outport. It also manages to bypass congested regions of the network and balances the tra ﬃ c load between outports that have an equal number of hop counts to its destination. Simulation results veriﬁed that in a fault-free scenario, the proposed solution outperformed a fault-tolerant XY by achieving a lower latency. At the same time, it attained a higher ﬂit delivery ratio compared to the e ﬃ cient dynamic and adaptive routing algorithm. Meanwhile, in the situation of a faulty network, the proposed algorithm could reach a higher ﬂit delivery ratio of up to 18% while still consuming less power compared to the e ﬃ cient dynamic and adaptive routing algorithm. dynamic adaptive (DyAD), look ahead fault-tolerant(LAFT), hybrid look ahead fault-tolerant (HLAFT), fault-tolerant deﬂection routing (FTDR), fault on neighbor(FoN), hierarchical FTDR (FTDR-H).


Introduction
The contracting size of transistors to submicron levels leads to a large number of cores combined onto a chip known as a system-on-a-chip (SoC). The bus-based architectures of SoCs are not able to meet the growing diverse communication requirements. According to Moore's law, the doubled packing density of micron technology is achievable every eighteen months. SoC architectures are unable to exploit the availability of these doubled PEs after every eighteen months due to latency and power nightmares [1].
Network-on-a-chip (NoC) architectures have evolved to overcome these growing challenges experienced by SoC architectures. An NoC's communication is based on packet routing networks instead of the wires and busses used in SoC. NoC architectures are used to provide the advantages of 1.
The proposed NAFTR algorithm decreased the latency in fault-free scenarios by avoiding the congested regions in the network.

2.
When there was a tie between two outports in terms of having a similar hop count toward a destination, the NAFTR algorithm ensured load balancing regarding route selection. 3.
Moreover, the NAFTR algorithm increased the flit delivery ratio by selecting alternate outports to avoid livelocks in the network.
The rest of the article is organized as follows. Section 2 summarizes the related research articles. The problem statement is formulated in Section 3. Section 4 briefly describes the proposed solution. Section 5 presents the experimental findings of this study. The article is concluded in Section 6.

Related Research
Many research works have proposed efficient routing mechanisms for NoC architectures. XY is a non-adaptive routing scheme [11] and is well known because of its simplicity in NoC architecture. In the XY algorithm, packets are always first routed to the horizontal plane and then to the vertical plane to reach its destination. This algorithm invariably routes packets via the shortest route. However, the XY algorithm is not able to avoid busy and congested links. Deterministic/partially adaptive routing algorithms [12] route packets while considering predetermined restricted turns to avoid livelocks and deadlocks in the network. Figure 1 depicts all the restricted turns in deterministic and partially On the other hand, fully adaptive routing algorithms dynamically adapt according to the network conditions. These algorithms add some virtual or physical channels to attain deadlock-free communication [13,14]. Various fault-tolerant routing algorithms have been proposed [15]. A partially adaptive fault-tolerant routing algorithm built on a negative-first approach is proposed in Glass and Ni [16]. However, in multiple faulty node scenarios, this algorithm does not perform efficiently. Another partially adaptive fault-tolerant routing algorithm is proposed in Wu [17], which is an enhanced version of the odd-even routing algorithm. An analogous algorithm, pertinent to the partially adaptive category, is proposed in References [18,19]. Numerous other research works have also utilized virtual channels to achieve fault-tolerance in NoCs [20,21]. The major drawback of virtual channel usage is the addition of the extra logic circuitry required for their implementation. Larger logical circuits yield an increased probability of faults occurring in the network, along with extra power requirements to operate them.
An adaptive fault-tolerant routing algorithm, which employs table-based routing for delivering packets from a particular source to the destination, is proposed in Schonwald et al. [22]. This algorithm is an enhanced version of fully force-directed wormhole routing (FDWR). Every node maintains a routing table to perform route decisions when forwarding packets to other nodes in the network. This algorithm is unable to ensure in-order delivery of packets. Additionally, building and maintaining the routing table yields an extra control overhead. An adaptive fault-tolerant routing algorithm is proposed in Singh et al. [23] that considers faults up to two hops away when considering the route selection for a packet. However, this algorithm does not manage busyness or hot spot regions in the network. Based on adaptive route selection, a fault-tolerant scheme is proposed in Savio Tse et al. [24]; unfortunately, this algorithm also does not manage busyness and congested regions of the network. Liu et al. proposed an efficient dynamic and adaptive routing (EDAR) algorithm [10]. For route calculations, EDAR considers busy, congested, and faulty links up to one hop away. However, because this route decision is based on the knowledge of only one hop away, this may point to a congested region, which results in a further delay. Moreover, EDAR assigns the same priority weight to outports with an equal number of hops to a destination. Thus, EDAR is restricted regarding load balancing between available paths with an equal number of hops. A hybrid fault-tolerant routing algorithm (HFTRA) was proposed by Bishnoi et al.; however, HFTRA requires additional network power and area overheads because of the additional virtual channels employed. Additionally, the proposed algorithm does not have the mechanism of fault identification [25]. Yang et al. [26] presented a fault-tolerant routing algorithm designed for a honeycomb-like topology. This algorithm does not cater for 2D/3D mesh topologies. Furthermore, Moriam et al. [27] designed an analytic approach to conduct the reliability assessment of adaptive routing algorithms of NoCs. Melo et al. [28] proposed a finite state machine (FSM)-based router controller. Their study focused on mitigating the error propagation rate in the router controller. They did not cater to business and congestion avoidance mechanisms in their proposed algorithm. Zhang et al. proposed an improved fault-tolerant routing algorithm [29]. Their study primarily focused on mitigating multiple packet diversions, which may result in livelocking. Unfortunately, they did not consider business and congestion avoidance scenarios in their proposed work. Low-power and high-performance adaptive routing for on-chip designs are proposed in Xiang and Pan [30]. The study focused on bypassing k hops to deliver the packet to the final destination in a fewer number of cycles. The study did not consider congestion and business avoidance mechanisms in their proposed solution. Another similar study focusing on reducing the hop count toward a destination via the use of specialized channels, known as a transmission line (TL), was proposed in Deb et al. [31]. This study also did not consider congestion and business avoidance mechanisms in their proposed algorithm. Song et al. proposed uniform-minimal-first (UMF) routing [32]. UMF alternatively selects between XYX and YXY routing. This alternate selection leads to load balancing of traffic across the network; however, this study unfortunately did not consider fault tolerance, congestion, or business avoidance in their proposed mechanism. Liu et al. [33] proposed a congestion-aware OE router, which employs fair arbitration for outport selection rather than random selection. This fair arbitration policy helps to obtain a lower latency and a higher throughput. This study is solely focused on congestion avoidance. Jin et al. proposed a history-aware-adaptive routing algorithm called HARE [34]. HARE intends to solve the end-point congestion problem by identifying head-of-line blocking flits in buffers. HARE achieves higher throughput and lower latency because of its ability to separate head-of-line blocking flits. The study did not consider fault and busy port avoidance in the proposed mechanism. Table 1 summarizes the state-of-the-art routing algorithms of NoCs. The comparison identifies the primary focus of each study, the supported network topology, the simulation platform, and whether the study utilizes virtual channels to handle faults in the network.
The key features that are essential for a routing algorithm are simplicity, fault tolerance, congestion, and business awareness. We selected EDAR to be optimized because it is a routing algorithm that possesses all these key features. We further optimized EDAR to propose NAFTR. NAFTR achieved a lower latency and higher throughput compared to EDAR, as indicated by the results presented in the experimental results section.

Problem Statement
EDAR is a valuable choice among other proposed fault-tolerant routing algorithms because of its simplicity; this leads to implementation ease and reduced latency. Moreover, EDAR also avoids congested and busy channels, along with avoiding faulty channels in the network. However, this route decision is based on the knowledge of only one hop. Therefore, the packet may reach a congested region or, in a few scenarios, bring us to a node that re-transmits the packet backward. In the livelock depicted in Figure 2a, node 5 needs to send a few packets toward node 7. While at node 5, the east port has the smallest weight of 1 because it leads to the smallest route to the destination only two hops away, whereas the south and north ports lead to a four-hop route from node 5 to the destination; therefore, they are assigned a weight of 2 each. Meanwhile, the west port leads to the longest route to the destination, which is seven hops from the current node; therefore, it has the highest weight of 3. Thus, node 5 selects the east outport for the packet forwarding because of its lower weight. However, the following east port will lead the packet to node 6, which falls in the congested region. Although EDAR tries to follow the shortest path to the destination, it leads the packet to a congested region, which results in an additional delay before reaching the destination. Consider another scenario, shown in Figure 2b, where node 6 has some packets to send to node 7. The shortest path to the destination uses the east port from node 6. However, the east port of node 6 is congested. The weight of the east port becomes 4 because an additional weight of 3 is added for a congested port (1 + 3 = 4), and the north and south ports are assigned the same weight of 2 each because they have the same hop count to the destination. EDAR does not have a mechanism to balance the traffic load among outports with a similar number of hops toward a destination. Therefore, EDAR will not utilize the south port, and it will continue to select the north port until it is congested. e-cube is a static routing method that employs XY-routing algorithm for hypercube networks., express virtual channel (EVC), dynamic XY (DyXY), VCT-switched Duato's protocol (DP), safe unsafe(SU), congestion aware odd-even (CAOE), dynamic adaptive (DyAD), look ahead fault-tolerant(LAFT), hybrid look ahead fault-tolerant (HLAFT), fault-tolerant deflection routing (FTDR), fault on neighbor(FoN), hierarchical FTDR (FTDR-H).

Problem Statement
EDAR is a valuable choice among other proposed fault-tolerant routing algorithms because of its simplicity; this leads to implementation ease and reduced latency. Moreover, EDAR also avoids congested and busy channels, along with avoiding faulty channels in the network. However, this route decision is based on the knowledge of only one hop. Therefore, the packet may reach a congested region or, in a few scenarios, bring us to a node that re-transmits the packet backward. In the livelock depicted in Figure 2a, node 5 needs to send a few packets toward node 7. While at node 5, the east port has the smallest weight of 1 because it leads to the smallest route to the destination only two hops away, whereas the south and north ports lead to a four-hop route from node 5 to the destination; therefore, they are assigned a weight of 2 each. Meanwhile, the west port leads to the longest route to the destination, which is seven hops from the current node; therefore, it has the highest weight of 3. Thus, node 5 selects the east outport for the packet forwarding because of its lower weight. However, the following east port will lead the packet to node 6, which falls in the congested region. Although EDAR tries to follow the shortest path to the destination, it leads the packet to a congested region, which results in an additional delay before reaching the destination. Consider another scenario, shown in Figure 2b, where node 6 has some packets to send to node 7. The shortest path to the destination uses the east port from node 6. However, the east port of node 6 is congested. The weight of the east port becomes 4 because an additional weight of 3 is added for a congested port (1 + 3 = 4), and the north and south ports are assigned the same weight of 2 each because they have the same hop count to the destination. EDAR does not have a mechanism to balance the traffic load among outports with a similar number of hops toward a destination. Therefore, EDAR will not utilize the south port, and it will continue to select the north port until it is congested.

Proposed Solution
The 2D mesh is the most trivial topology used in NoCs. In this article, we have evaluated the proposed algorithm on a 2D mesh topology. Figure 3 depicts a traditional 2D mesh topology of an NoC, where each PE is linked to a router via a local port for communication. Each router is also connected to

Proposed Solution
The 2D mesh is the most trivial topology used in NoCs. In this article, we have evaluated the proposed algorithm on a 2D mesh topology. Figure 3 depicts a traditional 2D mesh topology of an NoC, where each PE is linked to a router via a local port for communication. Each router is also connected to the available immediate next hop neighbors via west (W), east (E), south (S), and north (N) ports. Every PE is assigned an ID starting from 0. This ID is utilized to obtain the X and Y coordinates of a particular PE's location inside of an NoC network. Equations (1) and (2) can be used to calculate the X and Y coordinates of a particular PE.
Y coordinate = ID/no columns (2) the available immediate next hop neighbors via west (W), east (E), south (S), and north (N) ports. Every PE is assigned an ID starting from 0. This ID is utilized to obtain the X and Y coordinates of a particular PE's location inside of an NoC network. Equations (1) and (2) can be used to calculate the X and Y coordinates of a particular PE.
NAFTR virtually breaks down the network into eight regions labeled from D1 to D8. This virtual network division depends on the location of the destination PE relative to the current PE. The packet forwarding is done depending on the region the destination PE is located in (from D1 to D8). For every region, i.e., from D1-D8, for a particular packet, all of the outports of the router are assigned with a preferred port (PP) number from PP1 to PP3. The port leading toward the shortest route to the destination is labeled as PP1, the ports leading toward the next shortest route are labeled as PP2, and the port leading toward the longest route is labeled as PP3. Figure 4 illustrates this virtual network division for an 8 × 8 NoC network. The whole network is virtually partitioned into eight regions labeled from D1 to D8. This virtual partitioning will be different for every current and destination PE pair. The current PE is the one that is in the process of making a forwarding decision for a particular packet. The current and source PEs can be the same, but this will not be true in all cases. At the green-colored source PE, the east port leads toward the shortest path to the destination PE; therefore, it is labeled as PP1. The south and north ports lead toward the next shortest route to the destination; therefore, they are marked as PP2. Similarly, the port leading toward the longest route to the destination is the west port; therefore, it is marked as PP3.  NAFTR virtually breaks down the network into eight regions labeled from D1 to D8. This virtual network division depends on the location of the destination PE relative to the current PE. The packet forwarding is done depending on the region the destination PE is located in (from D1 to D8). For every region, i.e., from D1-D8, for a particular packet, all of the outports of the router are assigned with a preferred port (PP) number from PP1 to PP3. The port leading toward the shortest route to the destination is labeled as PP1, the ports leading toward the next shortest route are labeled as PP2, and the port leading toward the longest route is labeled as PP3. Figure 4 illustrates this virtual network division for an 8 × 8 NoC network. The whole network is virtually partitioned into eight regions labeled from D1 to D8. This virtual partitioning will be different for every current and destination PE pair. The current PE is the one that is in the process of making a forwarding decision for a particular packet. The current and source PEs can be the same, but this will not be true in all cases. At the green-colored source PE, the east port leads toward the shortest path to the destination PE; therefore, it is labeled as PP1. The south and north ports lead toward the next shortest route to the destination; therefore, they are marked as PP2. Similarly, the port leading toward the longest route to the destination is the west port; therefore, it is marked as PP3.

Modified EMBRACE Router Architecture
We have modified the emulating biologically-inspired architecture in hardware(EMBRACE) [35-38] router architecture. The EMBRACE router can detect busy, congested, and faulty channels efficiently. The modified EMBRACE router architecture is shown in Figure 5. A dotted black line surrounds the newly

Modified EMBRACE Router Architecture
We have modified the emulating biologically-inspired architecture in hardware(EMBRACE) [35][36][37][38] router architecture. The EMBRACE router can detect busy, congested, and faulty channels efficiently. The modified EMBRACE router architecture is shown in Figure 5. A dotted black line surrounds the newly added components. Channel conditions, such as busy, congestion, and faults, are shared with the adaptive routing scheme (ARS). ARS utilizes this information for the packet-forwarding decision. Adaptive arbitration policy (AAP) handles simultaneous requests for a particular outport. AAP resolves conflicts via arbitration. AAP also handles the allocation of virtual channels. The monitor module (MM) is responsible for monitoring the channel condition of the neighboring node. MM outputs the appropriate signal depending upon the channel condition, i.e., busy, congested, or faulty. Four MMs are responsible for monitoring all four adjacent node channels. This signal from four MMs is conveyed to ARS as a faulty/congested/busy flag, which is required for the computation of the next outport. Outputs of all four MMs are connected via AAP. If all three outgoing channels of a node are congested, then MM declares its fourth incoming channel as congested to its immediate neighbor as well to avoid incoming packets entering into the congested region. To do so, one additional set of OR and AND gates are applied at every MM. Figure 6 shows a scenario to further clarify the operation of newly added AND and OR gates. Suppose that node 6 has its east, west, and south ports congested. Let us have a close look at the operation of the north port's MM of the sixth node. The MM of the north port has all three inputs of the AND gate as 1 because the three other ports are congested. Now, the AND gate's output becomes 1 as all three inputs are 1 but the north port is not congested; therefore, the north port's MM output is 0. When this 0 and 1 are applied at the OR gate, its output becomes 1; therefore, node six will declare its incoming north port as congested to node 2 to avoid incoming packets from node 2 entering the congested region. Let us now have a look at the south port's MM operation for the same node. In fact, the AND gate of the south port's MM is given two 1's and one 0 as the input because its east and west ports are congested but its north port is not congested. The output of the AND gate becomes 0. The output of the MM of the south port is 1 because the south port is congested; therefore, when this 1 and 0, i.e., the output of the AND gate is applied at the OR gate, its output also becomes 1. Therefore, either the current channel is congested or all three remaining channels are congested; therefore, the MM declares its current channels as congested. This enables the neighboring router to avoid the congested region. The modified EMBRACE router requires one additional OR and AND gate per port for this purpose.

NAFTR Algorithm
The key annotations are explained in Table 2 and the proposed algorithm's working is illustrated via the pseudocode in Algorithm 1. The NAFTR algorithm performs the following defined steps while performing the packet forwarding decision.

NAFTR Algorithm
The key annotations are explained in Table 2 and the proposed algorithm's working is illustrated via the pseudocode in Algorithm 1. The NAFTR algorithm performs the following defined steps while performing the packet forwarding decision. Step 1: Compare the X and Y coordinates of the destination node with the current node. If they are equal, then send the packet toward the local outport. If they are not equal, then perform step 2.
Step 2: Assign the preferred port number from PP1-PP3 to all legitimate outports. This decision is based on which region the destination PE falls (region from D1-D8).
Step 3: Examine the condition of all legitimate outports. Assign a busy (W b ) one with a weight of 2, a congested (W c ) one with a weight of 3, and a faulty (W f ) one with a weight of 10. Add W b , W p, W f , and W c together to get the final total weight for every legitimate outport.
Step 4: Search for the outport with the minimum total weight among all possible outports. If this outport is equivalent to the inport of the packet, then perform step number 5. Otherwise, forward the packet towards this outport.
Step 5: Search for the alternate outport leading toward the path with the second-lowest weight and forward the packet to that outport. If there are two possible candidates for the second-lowest total weight selection due to having the same hop count to the destination, then randomly select one outport among them. This random selection will ensure load balancing between outports having a similar distance in hops toward a destination.

Latency Improvement Achieved Using NAFTR in Fault-Free Scenarios
NAFTR can lower the latency because of its ability to avoid congested regions. Let us revisit Figure 2a. Node 5 needs to send a few packets toward node 7. Three out of four outports of node 6 are congested; therefore, node 6 declares its west incoming port as congested. Now, the total weight of the east port at node 5 becomes 1 + 3 = 4. Therefore, node 5 selects the north port as the outport. This enables NAFTR to avoid congested regions in the network.

Load Balancing
EDAR assigns equal weights to outports with a similar hop distance toward a destination. It does not have a mechanism to break ties. EDAR selects the first port among those with the same hop count and continues to select that port until it is congested. This leads to congestion of one port all of the time and the other port not utilized to its full capabilities. NAFTR does not assign the same weight to all outports having the same hop distance toward a destination. Rather it randomly selects between two outports with a similar hop count. Thus, it distributes the share of traffic with equal probabilities among outports with a similar hop distance toward a destination.

Livelock/Deadlock Avoidance
The following assumptions were used in this paper for the avoidance of deadlock and livelock: (a) a packet is absorbed when it reaches a destination, (b) a node is not allowed to send a packet to itself, and (c) the source and destination PEs fall in a connected region. The strategies used to handle deadlocks are deadlock prevention, deadlock avoidance, and recovery. Using a virtual channel (VC) at the router falls under the category of deadlock avoidance [39]. The EMBRACE router architecture uses a VC to avoid deadlocks in the network. A VC is implemented as a first in, first out (FIFO) queue. When a packet enters a physical channel, it is assigned to the VC. This packet stays in the VC until the next computed outport is idle and ready to receive this packet. The router does not allow the VC assignment in a closed path/loop, which avoids deadlocks from happening in the network in the first place.
For livelock avoidance, a data packet coming from a given direction is not allowed to return in the same direction. NAFTR selects the outport with the next-lowest weight for packet forwarding if the shortest route toward a destination is the route from where the packet entered the router. Thus, the packet will eventually reach the destination, although it may experience a longer path delay in the process of avoiding congested regions. In the case of higher fault rates in the network, re-routing constraint mechanisms [40] may be applied, which constrains the maximum number of re-routings to be performed for a given packet. When the number of re-routings exceeds that threshold, the packet is discarded. This may raise concerns regarding the quality of service. In that case, an automated repeat request (ARQ) mechanism can be adopted to re-transmit the dropped packet through a different port number.  Figure 7 shows the latency vs. average flit delivery ratio comparison of the routing algorithms examined, which was done in a fault-free network scenario at different packet injection rates. The latency increased as the data rate increased in all traffic patterns. In bit-reversal, bit-shuffle, and butterfly traffic patterns, the traffic was seriously imbalanced. As the data rate increased, it resulted in the formation of congested regions in the network. At higher data rates, FTXY had the highest latency because of its inability to avoid congested regions and congested ports in the network. At lower data rates, NAFTR had a latency equal to or higher than FTXY because of its adaptive nature. As the data rate increased, NAFTR experienced lower latency than FTXY because of its key feature of avoiding congested regions. NAFTR also balanced the traffic among outports with a similar number of hops to the destination. Moreover, FTXY and NAFTR maintained a higher flit delivery ratio than EDAR because of their ability to handle livelocks in the network. Although EDAR had a lower average latency than FTXY and NAFTR, it achieved a lower flit delivery ratio. Due to EDAR's inability to select an alternate outport, the outport of a packet was occasionally the same as the inport of the packet, thus resulting in livelock in the network and consequently leading to a lower flit delivery ratio and non-monotonic behavior. Under the transpose traffic pattern in Figure 7d, NAFTR experienced slightly higher latency than FTXY because in the transpose traffic pattern, the traffic is not as severely imbalanced as in the case of bit-reversal, bit-shuffle, and butterfly patterns. Therefore, NAFTR followed longer paths to avoid rare busy and congested ports in the network, resulting in slightly higher average latency per channel at higher data rates.
Electronics 2020, 9, x FOR PEER REVIEW 13 of 17 was occasionally the same as the inport of the packet, thus resulting in livelock in the network and consequently leading to a lower flit delivery ratio and non-monotonic behavior. Under the transpose traffic pattern in Figure 7d, NAFTR experienced slightly higher latency than FTXY because in the transpose traffic pattern, the traffic is not as severely imbalanced as in the case of bit-reversal, bit-shuffle, and butterfly patterns. Therefore, NAFTR followed longer paths to avoid rare busy and congested ports in the network, resulting in slightly higher average latency per channel at higher data rates.  Figure 8 shows the average latency per channel and the average flit delivery ratio comparison of FTXY, EDAR, and NAFTR at different fault rates under various synthetic traffic patterns. Figure 8a,b show that NAFTR achieved a higher average flit delivery ratio at higher fault rates because of its ability to select the next-lowest weight outport when the outport calculated was equal to the inport of a packet. NAFTR followed longer paths to avoid faults and congested regions in the network, which resulted in a slight increase in average latency per channel compared to FTXY. EDAR had the highest average latency   Figure 8a,b show that NAFTR achieved a higher average flit delivery ratio at higher fault rates because of its ability to select the next-lowest weight outport when the outport calculated was equal to the inport of a packet. NAFTR followed longer paths to avoid faults and congested regions in the network, which resulted in a slight increase in average latency per channel compared to FTXY. EDAR had the highest average latency per channel because of its inability to select an alternate outport when the calculated outport was the same as the inport of the packet, which resulted in livelock, consequently leading to a higher latency in the network. The comparison results under bit-shuffle, butterfly, and transpose traffic patterns in Figure 8c-h show that NAFTR outperformed FTXY and EDAR by offering a higher flit delivery ratio at higher fault rates.  Figure 9a shows the total average network power consumed by FTXY, EDAR, and NAFTR in the fault-free network scenarios. The results indicate that EDAR and NAFTR consumed more power than FTXY as they required additional circuits to avoid busy and congested ports/regions in the network. On average, EDAR consumed 130% more power than FTXY, while NAFTR consumed only 33% more power than FTXY. Moreover, NAFTR consumed 41% less power than EDAR on average because of its ability to  Figure 9a shows the total average network power consumed by FTXY, EDAR, and NAFTR in the fault-free network scenarios. The results indicate that EDAR and NAFTR consumed more power than FTXY as they required additional circuits to avoid busy and congested ports/regions in the network. On average, EDAR consumed 130% more power than FTXY, while NAFTR consumed only 33% more power than FTXY. Moreover, NAFTR consumed 41% less power than EDAR on average because of its ability to avoid congested regions in the network (passing through congested regions requires additional power to hold the flits in intermediate virtual channels until the route is clear and the packet can be passed to the next hop). Figure 9b shows the total network average power consumed by EDAR, FTXY, and NAFTR in faulty network scenarios. The results show that EDAR and NAFTR consumed 76% and 67% more power than FTXY, respectively, as they consumed additional power to avoid congested and busy channels. However, NAFTR consumed 5% less total network average power than EDAR, while it offered a higher flit delivery ratio than FTXY and EDAR, as shown in Figure 8.  Figure 9b shows the total network average power consumed by EDAR, FTXY, and NAFTR in faulty network scenarios. The results show that EDAR and NAFTR consumed 76% and 67% more power than FTXY, respectively, as they consumed additional power to avoid congested and busy channels. However, NAFTR consumed 5% less total network average power than EDAR, while it offered a higher flit delivery ratio than FTXY and EDAR, as shown in Figure 8.

Hardware Requirement Analysis
The modified EMBRACE router architecture requires one OR and one AND gate per outport to signal neighbor nodes about congested regions to make them avoid congested regions in the network. For the 2D mesh topology, a modified Embrace router architecture requires four OR and four AND gates per router to avoid congested regions in the network. A four-input AND gate requires eight complementary metal-oxide-semiconductor (CMOS) transistors, while a two-input OR gate requires six CMOS transistors; therefore, a total of 56 CMOS transistors are required per router. For a 5 × 5 network, the total number of increased CMOS transistors would be 1400. This is not very high given that nowadays, millions of transistors are embedded in a single chip.

Conclusion
In this article, we propose a routing algorithm (NAFTR) for NoC interconnections. The proposed algorithm introduces a three-fold optimization of EDAR. First, NAFTR introduces load balancing of the traffic between routes that are an equal distance from the destination. To perform this load balancing, when all outports lead to routes having an equal hop count to a destination, NAFTR randomly selects an outport for packet forwarding. Second, NAFTR can also avoid livelocks by choosing an alternate route. When there is a possibility of a livelock, NAFTR calculates the outport with the next-lowest weight and forwards the packet to that alternate outport. Lastly, NAFTR is also able to avoid congested regions in the network. NAFTR achieves this through the use of one AND and one OR gate per port at the router. NAFTR was extensively compared with EDAR and FTXY under numerous synthetic traffic patterns and variable fault rates. The simulation results illustrated that NAFTR reduced the latency because of its ability to avoid congested regions and NAFTR also achieved a higher flit delivery ratio in a fault-free network due to its ability to avoid livelocks. Moreover, NAFTR also achieved a higher flit delivery ratio of up to 18% in the presence of multiple faults, while consuming less power than EDAR. In the future, we plan to extend NAFTR to work for other network topologies, such as 3D, honeycomb, and torus topologies. We plan to make NAFTR a routing algorithm that can be configured on the fly for most of the widely used topologies in NoCs. NAFTR can also be extended to work with wireless NoCs, where congestion and business can be a critical factor for wireless routers. Additionally, NAFTR can also be implemented on real hardware to further emphasize its performance gains.

Hardware Requirement Analysis
The modified EMBRACE router architecture requires one OR and one AND gate per outport to signal neighbor nodes about congested regions to make them avoid congested regions in the network. For the 2D mesh topology, a modified Embrace router architecture requires four OR and four AND gates per router to avoid congested regions in the network. A four-input AND gate requires eight complementary metal-oxide-semiconductor (CMOS) transistors, while a two-input OR gate requires six CMOS transistors; therefore, a total of 56 CMOS transistors are required per router. For a 5 × 5 network, the total number of increased CMOS transistors would be 1400. This is not very high given that nowadays, millions of transistors are embedded in a single chip.

Conclusions
In this article, we propose a routing algorithm (NAFTR) for NoC interconnections. The proposed algorithm introduces a three-fold optimization of EDAR. First, NAFTR introduces load balancing of the traffic between routes that are an equal distance from the destination. To perform this load balancing, when all outports lead to routes having an equal hop count to a destination, NAFTR randomly selects an outport for packet forwarding. Second, NAFTR can also avoid livelocks by choosing an alternate route. When there is a possibility of a livelock, NAFTR calculates the outport with the next-lowest weight and forwards the packet to that alternate outport. Lastly, NAFTR is also able to avoid congested regions in the network. NAFTR achieves this through the use of one AND and one OR gate per port at the router. NAFTR was extensively compared with EDAR and FTXY under numerous synthetic traffic patterns and variable fault rates. The simulation results illustrated that NAFTR reduced the latency because of its ability to avoid congested regions and NAFTR also achieved a higher flit delivery ratio in a fault-free network due to its ability to avoid livelocks. Moreover, NAFTR also achieved a higher flit delivery ratio of up to 18% in the presence of multiple faults, while consuming less power than EDAR. In the future, we plan to extend NAFTR to work for other network topologies, such as 3D, honeycomb, and torus topologies. We plan to make NAFTR a routing algorithm that can be configured on the fly