Aggressive scaling in deep nanometer technology enables chip multiprocessor design facilitated by the communication-centric architecture provided by Network-on-Chip (NoC). At the same time, it brings considerable challenges in reliability because a fault in the network architecture severely impacts the performance of a system. To deal with these reliability challenges, this research proposed NoCGuard, a reconfigurable architecture designed to tolerate multiple permanent faults in each pipeline stage of the generic router. NoCGuard router architecture uses four highly reliable and low-cost fault-tolerant strategies. We exploited resource borrowing and double routing strategy for the routing computation stage, default winner strategy for the virtual channel allocation stage, runtime arbiter selection and default winner strategy for the switch allocation stage and multiple secondary bypass paths strategy for the crossbar stage. Unlike existing reliable router architectures, our architecture features less redundancy, more fault tolerance, and high reliability. Reliability comparison using Mean Time to Failure (MTTF) metric shows 5.53-time improvement in a lifetime and using Silicon Protection Factor (SPF), 22-time improvement, which is better than state-of-the-art reliable router architectures. Synthesis results using 15 nm and 45 nm technology library show that additional circuitry incurs an area overhead of 28.7% and 28% respectively. Latency analysis using synthetic, PARSEC and SPLASH-2 traffic shows minor increase in performance by 3.41%, 12% and 15% respectively while providing high reliability.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited