^{1}

^{*}

^{1}

^{2}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Power is a primary concern in modern circuits. Clock distribution networks, in particular, are an essential element of a synchronous digital circuit and a significant power consumer. Clock distribution networks are subject to clock skew due to process, voltage, and temperature (PVT) variations and load imbalances. A target skew between sequentially-adjacent registers can be obtained in a balanced low power clock tree using techniques such as buffer and wire sizing. Existing skew mitigation techniques in tree-based clock distribution networks, however, are not efficient in coping with post design variations; whereas the latest non-tree mesh-based solutions reliably handle skew variations, albeit with a significant increase in dissipated power. Alternatively, crosslink-based methods provide low power and variation-efficient skew solutions. Existing crosslink-based methods, however, only address skew at the network topology level and do not target low power consumption. Different methods to manage skew and skew variations within tree and non-tree clock distribution networks are reviewed and compared in this paper. Guidelines for inserting crosslinks within a buffered low power clock tree are provided. Metrics to determine the most power efficient technique for a given circuit are discussed and verified with simulation.

On-chip clock distribution networks toggle the global clock signal between high and low voltages at up to several gigahertz frequencies in modern circuits, dissipating a significant portion of the total power. These networks deliver a clock signal to the sequential elements within an integrated circuit. Accurate circuit operation is therefore highly dependent on the clock skew characteristics [

A clock distribution tree can be designed based on specified timing constraints, while using existing skew mitigation techniques such as buffer insertion and sizing [

A qualitative comparison of crosslink-based topologies with different crosslink densities is shown in

In this paper, different clock network topologies to mitigate skew variations under specific skew and power constraints are reviewed and compared. Skew variations and power consumption in crosslink-based clock distribution networks are analyzed based on a simplified clock tree model. The conclusions are generalized and guidelines for inserting crosslinks within a buffered clock tree are provided. Analytic expressions for the upper and lower bound of the energy consumed by a crosslink-based network with specific skew constraints are also provided. The power efficiency of variation-tolerant crosslink-based and mesh-based topologies is compared based on closed-form expressions and simulation results.

The rest of the paper is organized as follows. Skew and power tradeoffs are reviewed in Section 2 for different clock distribution networks in moderate and high speed, low power circuits. Metrics to determine the most power efficient non-tree topology are provided in Section 3 and discussed in Section 4 based on simulation results. The paper is summarized in Section 5. Closed-form expressions for the energy consumed by a clock tree section with a crosslink and the optimum crosslink parameters are derived, respectively, in

A clock tree is a common clock distribution topology. Existing design solutions, such as buffer insertion and sizing, and wire sizing are used to balance the propagation delays and skew between sequentially-adjacent registers [

Clock gating techniques can be applied to tree-based clock topologies, producing efficient, low power clock networks. Clock trees are also simpler to model and analyze. Nevertheless, clock trees are sensitive to skew variations that limit performance and may cause circuit malfunctions.

Existing skew variation mitigation techniques include non-tree clock distribution topologies [

Skew and power related terms in a clock distribution network are described in Section 2.1. Tree-based methods for skew mitigation are presented in Section 2.2. Mesh- and crosslink-based topologies are discussed, respectively, in Sections 2.3 and 2.4.

The skew between sequentially-adjacent registers within a clock distribution network is an important design issue. The skew affects the timing margins within the data paths, changing the speed and functional behavior of a circuit. The skew is affected by load imbalances and interconnect coupling within a clock network, which can be controlled and mitigated during the design process [

Low power has recently become a primary design objective. In particular, non-tree clock networks, which tradeoff skew and skew variations for power, are compared to more efficient, low power clock distribution networks. Dynamic _{DD}^{2}_{Crossliinks}_{Crossliinks}V_{DD}^{2}

Furthermore, the short-circuit current between the crosslink connected segments dissipates additional short-circuit energy. The wire resistance

Many guidelines and algorithms have been introduced for designing balanced power efficient clock distribution trees in synchronous integrated circuits [

Mesh structures balance clock delays and effectively lower the skew between nearby segments, mitigating skew variations [

Mesh topologies, however, utilize significant wire length, resulting in a large capacitance and, consequently, significant dynamic power consumption. Additional power is dissipated due to the short-circuit currents flowing within the buffers driving the crosslinks. The short-circuit power is a linear function of the skew between the buffers driving the crosslinks [

Modeling mesh-based clock distribution networks is complicated due to the inherent feedback within the topology. Accurate analytic expressions characterizing a mesh are highly complex and require significant computational time. Several techniques, such as the Skew Bound method in [

Connecting the nodes within a clock mesh affects the local clock delays, balancing the skew and skew variations between the sinks. Only a portion of the affected sinks, however, are sequentially-adjacent registers which are sensitive to clock skew and skew variations [_{1} and _{2}, the tolerance to variations can be improved while dissipating little power by inserting a crosslink connecting _{1} to _{2} (see _{1} and _{2} and the additional redundant crosslinks that connect the non-adjacent sinks. The total wirelength of the mesh shown in _{1} and _{2}, as depicted in

Multiple techniques to maintain useful skew in clock distribution trees have been described [_{In}_{1} and _{In}_{2}, and outputs _{Out}_{1} and _{Out}_{2} are connected with a crosslink

Inserting a crosslink within a clock tree reduces the skew between the crosslink connected segments, while consuming additional power. Closed-form expressions for the clock skew and power consumed by two clock tree segments with a crosslink are described in this section based on the simplified clock network shown in

An ideal step input signal driving each CMOS inverter is assumed in these analytic expressions. Under this assumption, a large portion of the transistor operation occurs within the linear region [_{ON}_{G}_{1} and _{G}_{2} of the output drivers is included in the capacitance model. The wires within the clock tree segments, depicted in _{1} (_{2}) shown in _{1} (_{2}), shown in

The skew at the output of the section, shown in _{In}_{1} and _{In}_{2} of the section plus the difference between the propagation delays τ_{1} and τ_{2} between _{In}_{1} and _{Out}_{1}, and _{In}_{2} and _{Out}_{2}, respectively (due to different _{OUT}_{DD}_{1} − τ_{2}| = 0.693 | _{1}_{1} − _{2}_{2} |. The energy consumed by two clock tree segments forming a section without a crosslink, shown in

An ideal crosslink matches the propagation delay from the source of the clock tree to the crosslink connected segments, minimizing the skew between these segments. Inserting a crosslink between two non-zero skew segments may, however, affect the skew between the remaining clock tree segments [

A heuristic for inserting crosslinks should therefore be employed in a balanced clock tree: to preserve the useful skews within a balanced clock tree, the crosslinks between the zero skew segments need to be considered. These crosslinks would mitigate post-design skew variations, while producing similar propagation delays to the crosslink connected segments and, therefore, similar time constants,

A crosslink _{X}_{X}_{SC}_{In}_{1} = 0 and _{In}_{2} = 1), as illustrated by the dotted line shown in

The total current flowing through _{2}, shown in _{X}_{1} and ½_{X}_{2}, and the other current connected to ground through _{1}. The short-circuit current with a crosslink increases with lower crosslink resistance _{X}_{1}) dissipates short-circuit energy. Crosslinks with high resistance _{X}_{1} for a step input and slow input ramp, and for different values of _{X}

At the open-circuit limit (_{X}

The total energy consumption once the first input (_{In}_{2}) switches and until the output capacitors are charged, based on the circuit models depicted in

Note that not all of this dynamic energy consumed during

The short-circuit energy
_{X}_{X}_{X}_{X}_{X}

Similar to _{X}^{−2R/Rx} [_{1} = _{2} = _{1} = _{2} = _{X}_{1}(½_{X}_{1}) ≈ _{2}(½_{X}_{2}) _{50}_{%} is _{1}(_{50%}) = ½_{DD}

To design an efficient crosslink-based network, decisions should be determined regarding the crosslinks; (1) which pairs of clock tree segments should be connected by a crosslink, (2) where within each pair of segments should the crosslink be placed, and (3) the physical characteristics of the crosslinks. Guiding principles for crosslink insertion are provided in this section based on the analytic expressions described in Section 2.4.1.

The first design issue is determining which segments to insert a crosslink to reduce skew variations between sequentially-adjacent registers, while preserving useful skew in balanced clock trees. Any two clock tree segments located upstream to a pair of sequentially-adjacent registers, _{1} and _{2}, can be connected with a crosslink to mitigate skew variations between the sequentially-adjacent sinks, as depicted in _{TH}

The second design issue is determining the location of the crosslink between two zero skew segments. Skew variations between two zero skew nodes can be regulated by inserting a crosslink between the nodes. Thus, the primary objective in choosing the specific location of the crosslink within a clock tree section is to lower the total energy consumption. The additional energy from the crosslink is the sum of the dynamic energy due to the added wire capacitance and the short-circuit energy dissipated between the crosslink-connected segments. The additional dynamic energy is not significantly affected by the specific crosslink location within the clock tree segment. Alternatively, inserting a crosslink far from the input driver of a section increases the short-circuit path resistance, decreasing the total energy consumption.

The third design issue is the type of crosslink to place between segments. Given a crosslink _{X}_{X}_{X}_{X}_{X}_{X}

Low wirelength utilization, the availability of efficient techniques for locally controlling skew, and lower power are important advantages of tree-based clock distribution networks as compared to non-tree topologies. The reliability of clock trees in high performance variation-sensitive circuits is however reduced. Thus, in moderate and low performance circuits with aggressive power and area constraints, clock trees are preferable. Alternatively, in high performance circuits, non-tree topologies are preferred.

Non-tree clock networks are shown here to be an efficient alternative to a tree topology for coping with skew variations within clock distribution trees. Two zero skew segments upstream from a sequentially-adjacent variation sensitive pair of registers should be connected with a crosslink to mitigate these variations. Thus, to attenuate predicted skew variations between

At the limit, for large values of

Given an energy consumption of a clock tree _{Tree}_{Mesh}_{X}_{X}_{X}_{X}_{X}_{TH}

Given a crosslink _{X}_{X}_{TH}

_{Opt}_{OPT}

_{OPT}_{OPT}_{min},_{min}], [_{min},_{min}]), and _{x}_{OPT}, t_{OPT}

The variables _{X}_{TH}

The upper bound on the additional energy _{X,Max}_{Tree}_{1} + _{2})_{DD}^{2} from

Finally, the total additional energy from inserting a crosslink within a clock tree with _{Mesh}

The expression in _{1} = _{2} = _{1} = _{2} =

A crosslink-based topology should therefore be used to provide low power while mitigating skew variations when _{X,MAX}_{Mesh}

Several examples of moderate and large skew variations that effectively exploit non-tree topologies are described in this section for a zero skew clock tree and a clock tree with certain useful non-zero skew constraints. Different mesh- and crosslink-based topologies are considered. The decision regarding a preferable non-tree topology is based on the energy efficiency metric (_{X,MAX}_{Mesh}_{TH}_{P}_{P}_{TH}

For the zero skew clock tree, the largest skew, number of skew violations between sequentially-adjacent registers, and additional energy due to the inserted crosslinks or mesh connections are listed in _{TH}

Based on SPICE simulations for the case of moderate skew variations (_{X,MAX}_{MESH}

An analytic estimate of the energy is also listed in _{X,MAX}_{MESH}_{X,MAX}_{MESH}

Note that a more efficient solution may be achieved by either a mesh or a crosslink-based topology in certain clock networks, as shown in the aforementioned examples. The purpose of this work, as demonstrated by the example networks, is to provide metrics for determining the more efficient non-tree method to mitigate skew variations rather than suggest a general topology for any clock network.

Different topologies and techniques to design power efficient clock distribution networks at several operating frequencies are reviewed in this paper. For low power circuits that operate at moderate and low frequencies, a buffered clock tree may be the preferable method. To satisfy a specific set of timing constraints, a balanced, low power clock tree can be efficiently designed using existing techniques. Existing skew solutions in tree-based networks, however, are not efficient in mitigating manufacturing induced variations. Thus, in modern circuits with aggressive timing requirements, non-tree topologies should be considered to cope with skew variations. Mesh-based solutions have recently been shown to reliably mitigate skew variations through the use of a symmetric mesh structure, albeit at significantly higher power. Alternatively, mesh redundancy can be avoided in crosslink-based topologies to mitigate skew variations at lower power.

Guidelines for crosslink insertion in a balanced clock tree are presented in this paper. To maintain a target skew between sequentially-adjacent registers, a heuristic is proposed for inserting crosslinks within a balanced clock tree between upstream zero skew segments to those sequentially-adjacent registers that violate timing constraints. In addition, the crosslink should be inserted as far as possible from the section drivers for enhanced tolerance to variations at lower power. The optimum crosslink parameters under zero skew constraints are also presented. Tradeoffs between energy consumption and skew variations in crosslink-based topologies are investigated in this paper based on analytic expressions and simulations, demonstrating that crosslinks with lower resistance should be used to enhance the tolerance of a circuit to manufacturing induced variations; whereas crosslinks with high resistance and therefore low capacitance should be used in low power clock networks. Analytic expressions are also described to determine the most power efficient clock network topology under specific timing constraints. Simulation results are presented, confirming the conclusions of the theoretical analysis regarding the choice of topology for low power, variation-tolerant clock networks.

Power

Clock tree composed of the source, trunk, segments, and sinks.

Non-tree clock network topologies, (

An example of the excessive wirelength and power of a mesh as compared to a crosslink-based topology.

Two clock tree segments (

Two clock tree segments with impedance model (

Two clock tree segments connected with a crosslink. The dotted line illustrates the short-circuit current path for _{In}_{1} = “0” and _{In}_{2} = “1”.

Current through _{1} for (

Circuit model of clock tree section for (_{In}_{1} = “0”, _{In}_{2} = “1”) and (_{In}_{1} = “1”, _{In}_{2} = “1”).

Output voltage waveforms _{ClkOut}_{1}(_{ClkOut}_{2}(

Current components for _{In}_{1} = 1, _{In}_{2} = 0).

Mitigation of skew variations between _{1} and _{2} with a crosslink. A crosslink should be inserted at (_{1} and _{2}.

Mitigating skew variations within three pairs of registers, (_{1}, _{2}), (_{1}, _{4}), and (_{2}, _{3}), with (

Power efficient non-tree topologies to mitigate skew variations (

Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a zero skew clock tree.

_{P} |
||||||
---|---|---|---|---|---|---|

Clock tree | 51.56 | 5.16 | 64 | 53.33 | 0.00 | 0.00 |

With local crosslinks | 31.26 | 3.13 | 0 | 0.00 | 0.07 | 0.23 (_{X,MAX} |

With global crosslinks | 32.03 | 3.20 | 0 | 0.00 | 1.20 | 2.53 (_{X,MAX} |

With intermediate-level sparse mesh | 34.91 | 3.49 | 0 | 0.00 | 3.76 (_{MESH} |
N/A |

With intermediate-level dense mesh | 35.62 | 3.56 | 0 | 0.00 | 5.97 (_{MESH} |
N/A |

Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a zero skew clock tree.

_{P} |
||||||
---|---|---|---|---|---|---|

Clock tree | 71.28 | 7.13 | 64 | 53.33 | 0.00 | 0.00 |

With local crosslinks | 35.96 | 3.60 | 0 | 0.00 | 0.08 | 0.24 (_{X,MAX} |

With global crosslinks | 34.61 | 3.46 | 0 | 0.00 | 1.34 | 2.80 (_{X,MAX} |

With intermediate-level sparse mesh | 67.18 | 6.72 | 28 | 23.33 | 3.75 (_{MESH} |
N/A |

With intermediate-level dense mesh | 66.49 | 6.65 | 28 | 23.33 | 5.91 (_{MESH} |
N/A |

With sink-level sparse mesh | 53.27 | 5.33 | 2 | 1.67 | 4.07 (_{MESH} |
N/A |

With sink-level dense mesh | 46.16 | 4.62 | 0 | 0.00 | 6.28 (_{MESH} |
N/A |

Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a clock tree with a useful skew schedule.

_{P} |
||||||
---|---|---|---|---|---|---|

Clock tree | 77.81 | 7.78 | 64 | 53.33 | 0.00 | 0.00 |

With local crosslinks | 45.01 | 4.50 | 0 | 0.00 | 0.80 | 0.82 (_{X,MAX} |

With global crosslinks | 43.64 | 4.36 | 0 | 0.00 | 0.98 | 2.64 (_{X,MAX} |

With intermediate-level sparse mesh | 43.00 | 4.30 | 0 | 0.00 | 3.45 (_{MESH} |
N/A |

With intermediate-level dense mesh | 43.00 | 4.30 | 0 | 0.00 | 5.48 (_{MESH} |
N/A |

Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a clock tree with a useful skew schedule.

_{P} |
||||||
---|---|---|---|---|---|---|

Clock tree | 96.83 | 9.68 | 64 | 53.33 | 0.00 | 0.00 |

With local crosslinks | 61.86 | 6.19 | 16 | 13.33 | >0.79 | >1.39 (_{X,MAX} |

With global crosslinks | 60.77 | 6.08 | 16 | 13.33 | >1.07 | >3.31 (_{X,MAX} |

With intermediate-level sparse mesh | 47.84 | 4.78 | 0 | 0.00 | 3.43 (_{MESH} |
N/A |

With intermediate-level dense mesh | 39.39 | 3.94 | 0 | 0.00 | 5.44 (_{MESH} |
N/A |

The voltage at the output of a clock tree section and energy expressions for _{In}_{1} ≠ _{In}_{2}) and for _{In}_{1} = _{In}_{2}).

Differential equations are determined from _{In}_{1} ≠ _{In}_{2}, _{In}_{1} ≠ _{In}_{2}, _{1}_{ClkOut}_{1}(_{2}_{ClkOut}_{2}(_{1}(½_{X}_{1}) ≈ _{2}(½_{X}_{2}). The total energy consumed during the time interval [0,

To determine the additional energy consumed for _{In}_{1} = _{In}_{2},

The solution of _{In}_{1} = _{In}_{2}, _{1}_{ClkOut}_{1}(_{2}_{ClkOut}_{2}(_{1}(½_{X}_{1}) ≈ _{2}(½_{X}_{2}).

The energy consumed for

The total energy consumption once the first input _{In}_{2} switches until the output capacitors are fully charged is

The optimum crosslink resistance under the zero skew _{X}_{TH}_{X}_{TH}

Skew variations between crosslink connected segments decrease with lower crosslink resistance _{X} = ρl_{X}_{TH}_{X}_{TH}

Interconnect parameters.

Optimum crosslink resistance
_{X}_{TH}

Given length _{X}_{TH}

Optimum crosslink capacitance
_{X}_{TH}

Note that _{min}_{min}_{min}_{min}_{min}_{min}_{X}_{min}_{min}_{min}_{min}_{X}_{1} = 2_{g}_{1} + 2_{C}_{1} for the local and intermediate layers and _{X}_{2} = _{g}_{2} + 2_{C}_{2} for the global interconnect, where
_{g}_{1}), and global (_{g}_{2}) layers and
_{C}_{1}), and global (_{C}_{2}) interconnect. Thus, given an interconnect length

Bounds of the crosslink width and thickness based on the minimum feature size and

Note that the derivatives _{X}_{X}_{X}

The partial derivative of

Higher (lower) values of _{STAT}_{STAT}_{X}_{STAT}_{STAT}_{STAT}_{STAT}_{min}_{min}_{min}_{min}_{X}_{STAT}_{STAT}_{OPT}_{OPT}_{STAT}_{STAT}

Finally, _{OPT}_{OPT}_{X}_{X}_{TH}