Novel Full TMR Placement Techniques for High-Speed Radiation Tolerant Digital Integrated Circuits

This paper presents a novel physical implementation methodology for high-speed Triple Modular Redundant (TMR) digital integrated circuits for harsh radiation environment applications. An improved distributed approach is presented to constrain redundant branches of Triple Modular Redundant (TMR) digital logic cells using repetitive, interleaved micro-floorplans. To optimally constrain the placement of both sequential and combinational cells, the TMR netlist is used to segment the the logic into unrelated groups allowing sharing without compromising reliability. The technique was evaluated in a 65 nm bulk CMOS technology and a comparison is made to conventional methods.


Introduction
Single Event Effect (SEEs) are undesired erroneous effects in digital integrated circuits caused by ionizing radiation. With CMOS device scaling, SEEs have become an increasingly important reliability concern leading to severe soft-error rates in advanced systems [1]. Not only in nuclear instrumentation or space applications, even in critical commercial applications such as autonomous transport systems soft errors have become an increasingly important concern. With the growing complexity of digital circuits and clock frequencies, the overhead of redundancy should be reduced to a minimum. This paper addresses both Single Event Transients (SETs) and Single Event Upsets (SEUs). SETs are temporal erroneous signals which originate from the charges generated by the incident particles which are collected by the transistor in a combinational cell. They will recover over time, as can be seen on Figure 1 [2]. However if the SET propagates to a sequential cell like a flip-flop and occurs within the setup and hold times of the registers near a clock edge, the SET is latched leading to an incorrect logical state which is also known as an SEU as is shown in Figure 1. The probability of such latching increases proportionally with higher clock frequencies [3,4]. SEUs can also occur when charged particles directly hit sequential digital circuits such as latches and flip-flops. When the register involves a bit-flip, this erroneous signal may remain in the digital system and can even propagate to other digital modules resulting in a failure. For example, an SEU could change the state of an Finite State Machine (FSM) temporarily impacting the entire system. SEUs can thus originate from direct upsets in the registers or as a result from SETs in the combinational logic, latched during clocking. Triple Modular Redundancy (TMR) can be used to protect digital logic from SEEs. It uses redundant logic with majority voters to correct logic signals [5]. TMR only works if only single errors occur in the digital logic, hence Multi Bit Upsets (MBUs) in common logic signals can be catastrophic for TMR. Nowadays complementary metal oxide semiconductor (CMOS) technologies have scaled to the point that MBUs have become a serious concern since a single particle can affect multiple gates simultaneously [6][7][8]. This was less important in old CMOS technologies where single particles only affected single digital cells. However, with proper placement techniques, the fault tolerance can be ensured without compromising speed or power consumption in the design which is addressed in this paper.
Historically, several methods were developed to address trade-offs in TMR designs like power consumption and area efficiency. Full TMR is the most robust and most complete form of redundancy. In this approach, both the flip-flops, clock-tree and combinational logic cells are triplicated [9,10]. However, the drawbacks of this approach are the high number of resources (digital gates) and power consumption. Nevertheless it is the most solid and secure form of TMR. One of the competing methods is temporal time redundancy [11]. In this method, only flip-flops are triplicated which are clocked with 3 skewed clocks. The combinational logic is not redundant. The skew between the clocks must be larger than any possible SET, hence only one flip-flop could possibly latch an SET. This method has proven its usage in many applications [12]. However its major drawback is its limited clock frequency since clock skew places strict timing constraints on the design typically resulting in sub-GHz timing performance. Henceforth, many high-speed mixed-signal digital modules are based the original TMR approach. Additionally, Error Detection and Correction (EDAC) codes can also be used for radiation hardness assurance. They are usually placed surrounding the sequential cells to correct for any SEUs. Depending on the coding scheme, EDACs are also vulnerable to MBUs which must be mitigated as well and might be more difficult to ensure compared to TMR. The coding and decoding logic also adds to additional timing overhead which might significantly slow down the critical datapaths.
As indicated above, TMR only works if single errors occur. In deep-submicron technologies, proper physical implementation is required to ensure no MBUs occur between cells of the same TMR logic branch. This paper presents an innovative optimised physical placement methodology for full TMR design.

3 Block Approach
One the most frequently used methods for physical implementation of TMR designs is by separating A-B-C logic in 3 different areas. Hence, a floor plan is created with 3 blocks (named A, B and C) as is shown on Figure 2. All flipflops are constrained to their specified block A-C, the combinational logic and the clock tree will intrinsically follow the placement of the flipflops. However, each sequential net has 6 cross domain (A-C) interconnections(voter inputs) leading to long nets that transverse across the entire floorplan. Hence, the power consumption increases, and the net connections between the flipflops become congested.
As the design size increases, the length and the routing will be more complex resulting in increased routing congestion and power consumption due to additional buffers inserted by the place-and-route tools in the cross domain voted nets to meet timing constraints. Hence, power consumption and routing complexity is the main concern limiting its usability small or low frequency designs.

Advanced Placement Methods
In this section, an optimal physical implementation scheme is proposed: an interleaved method [13] and an improved interleaved method, overcoming power and area trade-offs.

Interleaved Floorplan
One of the limitation of the 3 block floorplan is the limited freedom of place-and-route tools to place critical logic relatively close together (while respecting minimal spacing). The interleaved approach, presented in Reference [13], uses a semi-distributed placement method to ensure maximal freedom to the place-and-route tools to optimize the design. It is based on the conventional 3 block approach but uses an interleaved placement constraining method of many small A,B,C sections, shown on Figure 2. Instead of 3 large blocks, there are multiple repeating small regions, allowing cells of A-C branches to be placed at different vertical spots. Each region has the same fixed height. The distance between each region ensures that Multi Bit Upsets (MBUs) cannot occur. As the design size expands, the height of the regions does not expands but only the number of vertical regions increases. Consequently, vertical connections between voters always cross the same narrow placement region and have equal lengths, regardless of the design size. Therefore, the place and route tool has much more freedom to place the cells vertically and much closer to each other. This is a significant improvement compared to the 3 block implementation where voter connections have to cross a significant portion of the design. To constrain the flip-flops and the data path cells, a trace-back algorithm was used to find the corresponding logic tree from a source sequential element [13].

Improved Interleaved Method
The main drawback of the interleaved method is the lost space between the placement sections to ensure proper spacing between TMR branches. To improve the interleaved method, the same principle can be applied, however the lost space can be recovered by filling its empty area with unrelated digital logic. Again, this method allows a semi-distributed placement to create maximal freedom to place-and-route tools to optimise the design as shown in Figure 2. In the proposed method, a floorplan is made using 6 physical constrain groups (A1, A2, B1, B2, C1, C2), or denoted ABC1 and ABC2. Each group has a height equal to or larger than the required spacing distance to prevent MBUs and occupies the entire width of the design. Vertically, all groups are repetitive to fill the vertical design space (e.g., A1-A2-B1-B2-C1-C2-A1-A2-etc.). A TMR logic branch is placed in either ABC1 or ABC2. As such, one group acts as spacer to the other and is allowed to share upsets if the cells do not have a common datapath. To ensure maximal area efficiency, TMR paths are balanced between ABC1 and ABC2 if they do not share a common combinational path and thus are allowed to share multi-cell upsets.
The advantage of this approach is the elimination of lost space, as can be seen on Figure 2. To balance the logic between ABC1 and ABC2, a segmentation algorithm is used to detect if a logic tree is connected to an existing tree in ABC1 or ABC2. If one of the cells in a logic tree is already in ABC1 or ABC2, then this entire instance group is placed within that group. If this is not the case, the cells are placed in the least filled group. The total area of each group is continuously maintained in order to balance them in terms of area. The segmentation algorithm is shown graphically in Figure 3. The time needed for the segmentation algorithm is negligible compared to the duration of the place and route tools. Furthermore the time needed to backtrace all combinational cells from netlist to the 3 different A-B-C branches is substantially longer than the segmentation algorithm itself, although also negligible compared to the place and route tools.

Simulated Analysis
Different comparative studies were performed with either the interleaved, the improved interleaved method or the standard 3 block approach to evaluate the efficiency and performance of this new placement and floorplanning technique. As benchmark, a design with eight identical and independent high-speed counters was used. To introduce more complicated standardised data paths, the counter dimensions (widths) and count of the benchmark models were varied. Larger counter widths would indicate a more complex datpath. More counters resemble a larger overall design in order to evaluate if the techniques scale with increasing design widths. The designs have been implemented and analyzed using the Cadence Innovus Computer Aided Design (CAD) tools. To guarantee a timing critical design, the timing limitations were selected to be near the technology boundaries. In the analysis, power consumption, net length, net capacitance and routing density were evaluated for each method. The slice height and spacing of the interleaved techniques was chosen as 7.2 µm, whereas the 3 block technique has 7.2 µm block spacing. This number aligned with the cells' row heights. Finally, the timing, power and region reports from pace-and-route tools were extracted. Figure 4 shows the routed designs of 8 × 16 bit counters for the 3 different methods. We did not observe a considerable difference in performance variation for both the interleaved techniques for varying counter dimensions. Compared to the standard 3 block strategy, it is evident that the suggested interleaved and improved interleaved methods result in considerably reduced complexity. In particular, the vertical routing difficulty reduces significantly since the place-and-route tool has more freedom to efficiently place the standard cells closer to each other. Comparing the interleaved and the improved interleaved method, it is clear that the difference between the routing complexity is relatively small, as expected. However the area efficiency (standard cell density) to place the design is improved. The Amouba view of the design is shown on Figure 5. Each colour represents a triplicated counter. In the 3 block method, the counters are distributed across the 3 bulky blocks. In the interleaved method there is a much better grouping of the counters compared to the 3 block method since the cells can be spaced more closely. Again, the improved interleaved method demonstrates its advantage in the fact that independent combinational path acts as a spacing distance for the other, ensuring maximal area efficiency.  The distances between the cells of same TMR branches are shown on Figure 6. The Distances between the cells of the 3 block implementations are substantially larger than those within both the interleaved methods. Most cells are spaced within a range of 45 µm, which corresponds to approximately 3 elementary, vertical interleaved banks. It is evident from this consequence that the placement engine has more liberty to put cells closer without compromising the radiation hardness. The average cell distance between A-branch and B-or C-cells for the 3 block technique shows two peaks corresponding with the A-B-C and the A-C distance.
A comparison of the net lengths distribution is shown on Figure 7. The histogram demonstrates that due to voting interconnections, a significant part of the nets for the 3 block method has 1/3 to 1/2 of the design size. This peak is no longer present in the suggested interleaved techniques. In this case, most networks that are interconnected have a net length of 25 µm or less, though this is still design specific. However, in the interleaved model there are still a few long nets. By analysing pre-Clock Tree Synthesis (CTS) and post-CTS histograms, it becomes apparent that these longer connections originate from the clock tree.  Clock trees A-C in the interleaved implementations are now distributed throughout the entire design, while the clock tree was only locally placed in each of the 3 regions in the 3 block approach. This is shown on Figure 8. Each colour represent one of the clock branches.
The average metal density of the different metal layers is shown on Figure 9. M1 and all the horizontal layers show no compelling difference since these layers are not used to to interconnect cells. On the other hand, M2 and M4 shows a significant difference due to routing between voter cells. The density in vertical routing layer M2 is almost reduced by half compared to the 3 block method and there is almost no metal density in layer 4, indicating the significance of the proposed strategies. In metal layer 3 there is a slight increase in density of the proposed method. When comparing the interleaved with the proposed improved implementation, a higher gate density is achieved which, as a result, also increases the local corresponding routing density, mainly in M3. Additionally, In the 3 blocks method the placement blocks are much higher which results in a more favourable vertical distribution. In the interleaved methods there are many more shorter blocks which results in a different placement vertically. However, the main figure is the reduction in extremely long M4 nets across the floorplan.  A comparison of the power consumption of the 8 × 16 bit counter is shown in Table 1. These numbers were extracted after routing and CTS. The internal power is the power consumption of the unloaded standard cells, switching power is the dynamic power consumption due to the switching of the capacitive loads (cells and nets) and the Total capacitance is the sum of all net and input capacitances of the cells. As can be expected, the internal power does not change significantly since the design remains almost identical and the only difference between all methods arises from different buffers which is only a small fraction of the total internal power consumption. The total net lengths of both interleaved methods are significantly smaller compared to the 3 block approach due to the optimal placement by the place-and-route tools. As a result, the total capacitance of the design reduces proportionally as is the dynamic power consumption. Since the main reduction is a result from avoiding long voters interconnects, the improvements become more significant as the design size increases. Therefore, this technique scales well with larger digital designs. Finally, these results were extracted by evaluating a 65 nm CMOS technology. This methodology however scales well to smaller CMOS nodes. Firstly, in smaller nodes, designs often become more complex and the need for optimal placement increases significantly. Secondly, smaller nodes become more susceptible to SEEs meaning the proposed methods will be increasingly mandatory. Finally, since routing becomes the strongest contributor to power consumption in deep submicron technologies, the proposed methods will show an increasing improvement as devices scale down.

Conclusions
The major advantage of this distributed placement approach is that place-and-route tools have more freedom to distribute logic across the floorplan. In contrast to the 3 block approach, interconnections between voters do not need to cross a large center block that results in major routing complexity and power consumption. The total net length is drastically reduced since the connected logic can be placed more closely together, still ensuring minimal spacing for SEEs. As a consequence, the switching power is reduced. With the proposed improved distributed method, by using the placement balancing between ABC1 and ABC2 and using one group as MBU spacer for the other, the area efficiency is maximized compared to the earlier reported interleaved placement strategy.
Author Contributions: Conceptualisation, K.A. and J.P.; methodology, K.A. and J.P.; validation, K.A. and J.P.; investigation, K.A. and J.P.; writing-original draft preparation, K.A. and J.P. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by FWO.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.