Low Power Clock Network Design

.


Introduction
On-chip clock distribution networks toggle the global clock signal between high and low voltages at up to several gigahertz frequencies in modern circuits, dissipating a significant portion of the total power.These networks deliver a clock signal to the sequential elements within an integrated circuit.Accurate circuit operation is therefore highly dependent on the clock skew characteristics [1].The clock skew within a clock distribution network is, in particular, an important factor that affects timing margins and circuit operation.Thus, the distribution of the clock signal is a critical design issue that affects overall system timing and reliability, and requires power efficiency.
A clock distribution tree can be designed based on specified timing constraints, while using existing skew mitigation techniques such as buffer insertion and sizing [2][3][4] and wire sizing [3][4][5] to produce the target skews.Localized clock skew scheduling [1] and clock gating techniques [6] can also be applied in tree-based clock topologies for lower power.Clock skew, however, is subject to process, voltage and temperature (PVT) variations that affect the clock skew schedule, limiting the performance and functionality.Furthermore, skew variations have become increasingly significant with smaller clock periods, requiring low power solutions.Non-tree topologies  have been introduced for variation-tolerant design of high performance clock distribution networks.The density of the non-tree elements in these topologies may vary from a few additional connections (or crosslinks) [20][21][22][23][24][25][26] to a completely dense mesh structure [6][7][8][9][10][11][12][13][14][15][16][17][18][19], covering the entire network with crosslinks.The crosslink connections between the clock tree segments provide alternative paths for the clock signal, maintaining delay balance while mitigating both the skew caused by imbalances and PVT variations between the connected segments.Thus, tolerance to variations increases with a larger number of crosslinks.The dynamic power dissipated by the inserted crosslinks is however also proportional to the number of connections.In addition, short-circuit currents [21] flow between the connected segments, dissipating short-circuit power that also increases with a larger number of crosslinks.Note that clock gating for low power is not applicable in non-tree networks, limiting the local control of the clock distribution network and, therefore, the ability to manage the power consumption.
A qualitative comparison of crosslink-based topologies with different crosslink densities is shown in Figure 1 in terms of power dissipation and skew variations.The power dissipated by the non-tree clock distribution networks can therefore be traded off for skew tolerance.In some integrated circuits, an efficient power-skew tradeoff can be achieved with a mesh-based topology, while in other circuits a crosslink-based network is preferable to produce a variation-tolerant, low power clock distribution network.
In this paper, different clock network topologies to mitigate skew variations under specific skew and power constraints are reviewed and compared.Skew variations and power consumption in crosslink-based clock distribution networks are analyzed based on a simplified clock tree model.The conclusions are generalized and guidelines for inserting crosslinks within a buffered clock tree are provided.Analytic expressions for the upper and lower bound of the energy consumed by a crosslink-based network with specific skew constraints are also provided.The power efficiency of variation-tolerant crosslink-based and mesh-based topologies is compared based on closed-form expressions and simulation results.The rest of the paper is organized as follows.Skew and power tradeoffs are reviewed in Section 2 for different clock distribution networks in moderate and high speed, low power circuits.Metrics to determine the most power efficient non-tree topology are provided in Section 3 and discussed in Section 4 based on simulation results.The paper is summarized in Section 5. Closed-form expressions for the energy consumed by a clock tree section with a crosslink and the optimum crosslink parameters are derived, respectively, in Appendices A and B.

Skew Mitigation Techniques
A clock tree is a common clock distribution topology.Existing design solutions, such as buffer insertion and sizing, and wire sizing are used to balance the propagation delays and skew between sequentially-adjacent registers [6] within a clock tree based on satisfying the permissible range constraints [6].A buffered clock tree is comprised of a source buffer that drives the trunk of the clock tree, the internal buffer-interconnect-buffer segments, and the sequential gates at the sinks of the clock tree, as shown in Figure 2.
Clock gating techniques can be applied to tree-based clock topologies, producing efficient, low power clock networks.Clock trees are also simpler to model and analyze.Nevertheless, clock trees are sensitive to skew variations that limit performance and may cause circuit malfunctions.
Existing skew variation mitigation techniques include non-tree clock distribution topologies , where alternative paths for the clock signal are provided to manage the local skew, thereby maintaining a temporal balance.A crosslink-based topology [20][21][22][23][24][25][26] is a non-uniform asymmetric tree-based structure with a varying density of wire segments, each connecting two segments within a clock tree.The design of a crosslink-based clock network depends on three characteristics: the location of the crosslinks within a clock tree (in terms of the crosslink connected segments), the specific crosslink location between the connected segments, and the size of the crosslink.Alternatively, crosslinks may connect all or a specific group of adjacent segments within a specific level of a clock tree, forming a regular symmetric mesh-based [6][7][8][9][10][11][12][13][14][15][16][17][18][19] clock network (see Figure 3).

Skew and Power-Definitions and Background
The skew between sequentially-adjacent registers within a clock distribution network is an important design issue.The skew affects the timing margins within the data paths, changing the speed and functional behavior of a circuit.The skew is affected by load imbalances and interconnect coupling within a clock network, which can be controlled and mitigated during the design process [1].This skew, however, is subject to post-design PVT variations that can significantly change the skew within a balanced clock network, adversely affecting circuit operation.Useful clock skew is only relevant between sequentially-adjacent registers, and can be positive, negative, or zero [1].A large negative clock skew can cause a race condition between sequentially-adjacent registers; while a large positive clock skew may limit circuit performance.The network should therefore be carefully designed to ensure that the local skew is within the permissible skew range [1].In current circuits, skew variations can be of the same order of magnitude as the clock period [27].Thus, post-design skew variations should be mitigated to ensure that the nominal skew with variations is within the permissible skew range.
Low power has recently become a primary design objective.In particular, non-tree clock networks, which tradeoff skew and skew variations for power, are compared to more efficient, low power clock distribution networks.Dynamic αCV DD 2 f power consumption in clock distribution networks is proportional to the total capacitance of the clock network and load, where α is the switching activity.
Adding crosslinks with a total capacitance C Crossliinks to a clock tree increases the dynamic power of the clock network by αC Crossliinks V DD 2 f.Since the wire capacitance is linearly proportional to the wire length l and increases with larger wire width w and thickness t [28], the dynamic energy consumption increases with longer and wider crosslinks.Furthermore, the short-circuit current between the crosslink connected segments dissipates additional short-circuit energy.The wire resistance ρl/wt is linearly proportional to the wire length l, and inversely proportional to the wire width w and thickness t, where ρ is the resistivity of the line.Long and narrow resistive crosslinks are therefore less conductive and limit the short-circuit current, mitigating the short-circuit power dissipation.

Clock Tree Topology
Many guidelines and algorithms have been introduced for designing balanced power efficient clock distribution trees in synchronous integrated circuits [2][3][4][5][6].Many different clock tree topologies are used, ranging from asymmetric structures to symmetric trees, such as H-trees and X-trees [6].A buffered tree is a common approach to distribute a clock signal to the sequential gates to satisfy a specific clock skew schedule.Enhanced control and accuracy of the distributed clock signal waveforms can be obtained by buffer and wire sizing.A tree-based topology can also be accurately modeled by closed-form analytic expressions [6].Various techniques, such as localized clock skew scheduling and clock gating [6], have been developed to reduce the power consumed by a clock tree.In high performance circuits, however, post-design skew variations adversely affect the nominal target skew, decreasing the reliability of tree-based clock networks.Thus, non-tree alternatives such as a mesh should be considered to mitigate skew variations in high performance circuits.

Mesh-Based Clock Topology
Mesh structures balance clock delays and effectively lower the skew between nearby segments, mitigating skew variations [6][7][8][9][10][11][12][13][14][15][16][17][18][19].Mesh-based clock distribution networks have been utilized in a variety of commercial high performance microprocessors, such as the Power4 [16], Digital Alpha [17], Intel Pentium 4 [18], and Xeon [19], effectively addressing the issue of clock skew and skew variations.Mesh topologies, however, utilize significant wire length, resulting in a large capacitance and, consequently, significant dynamic power consumption.Additional power is dissipated due to the short-circuit currents flowing within the buffers driving the crosslinks.The short-circuit power is a linear function of the skew between the buffers driving the crosslinks [21], and can dissipate more than 80% of the total power in highly unbalanced mesh networks [9].Both uniform and non-uniform mesh topologies have been recently investigated, demonstrating lower skew and higher variation tolerance in dense grids.The number of crosslinks and the mesh wire length, however, increase with mesh density, resulting in higher short-circuit and dynamic power [6][7][8][9][10][11][12][13][14][15][16][17][18][19].Thus, dynamic and short-circuit power can be traded off for skew.Mesh reduction [10], sizing of the buffers driving the crosslinks [9], and cost function-based algorithms to reduce power consumption have been presented [8][9][10].High power consumption, however, remains the primary disadvantage of mesh-based clock distribution networks.
Modeling mesh-based clock distribution networks is complicated due to the inherent feedback within the topology.Accurate analytic expressions characterizing a mesh are highly complex and require significant computational time.Several techniques, such as the Skew Bound method in [8] and the Sliding Window Scheme in [9], have been recently proposed to estimate the skew and power of mesh-based clock networks.Modeling the buffers driving the crosslinks for low computational complexity in the analysis process has also been considered [9].To improve the scalability of the clock mesh analysis process, reduced order modeling and port sliding can be used [11,12].In addition, decomposition of the clock mesh into linear and nonlinear subsystems, and a dynamic time step rounding technique [13] are employed to reduce the number of macromodels required to represent a mesh system.
Connecting the nodes within a clock mesh affects the local clock delays, balancing the skew and skew variations between the sinks.Only a portion of the affected sinks, however, are sequentially-adjacent registers which are sensitive to clock skew and skew variations [1].Thus, crosslinks between non-sequentially-adjacent sinks do not affect circuit operation and unnecessarily dissipate dynamic and short-circuit power.The regularity of mesh-based topologies however prevents these crosslinks from being removed.An example of the excessive redundancy of mesh-based solutions is illustrated in Figure 4b.For a clock tree with two sequentially-adjacent sinks, Reg 1 and Reg 2 , the tolerance to variations can be improved while dissipating little power by inserting a crosslink connecting Reg 1 to Reg 2 (see Figure 4a).This crosslink efficiently mitigates variations within the highlighted paths, shown in Figure 4a.A sink-level mesh is depicted in Figure 4b with a crosslink between Reg 1 and Reg 2 and the additional redundant crosslinks that connect the non-adjacent sinks.The total wirelength of the mesh shown in Figure 4b is therefore significantly greater than the crosslink length, as depicted in Figure 4a.A sink-level mesh-based solution therefore reduces variations; however, at significantly higher power.Alternatively, an intermediate-level mesh mitigates PVT variations primarily between the upper clock tree segments, resulting in higher skew variations between the sequentially-adjacent sinks, Reg 1 and Reg 2 , as depicted in Figure 4b.Additional degrees of design freedom are therefore available in crosslink-based topologies, while potentially dissipating significantly less power.

Crosslink-Based Clock Topology
Multiple techniques to maintain useful skew in clock distribution trees have been described [20][21][22][23][24][25][26], exhibiting resource efficient and low power skew solutions.The sensitivity of clock distribution trees to PVT variations, however, increases with circuit speed and technology scaling, resulting in large skew variations.Given a clock tree that satisfies useful skew constraints, crosslinks can be inserted that maintain a useful skew schedule while lowering variations in the skew.Guidelines, however, should be established regarding (1) the selection of which clock tree segments should be connected with a crosslink, (2) the crosslink location between the selected segments, and (3) the crosslink physical characteristics.This topic is considered in this section.Power and skew tradeoffs are reviewed in a simplified clock network (see Figure 5) in Section 2.4.1,where two clock tree segments with the inputs Clk In1 and Clk In2 , and outputs Clk Out1 and Clk Out2 are connected with a crosslink X, modeled as a lumped RC wire.These results are later generalized in Section 2.4.2 to provide guidelines for multiple crosslink insertion.

Power and Skew Tradeoffs in Simplified Crosslink-Based Clock Networks
Inserting a crosslink within a clock tree reduces the skew between the crosslink connected segments, while consuming additional power.Closed-form expressions for the clock skew and power consumed by two clock tree segments with a crosslink are described in this section based on the simplified clock network shown in Figure 5.An ideal step input signal driving each CMOS inverter is assumed in these analytic expressions.Under this assumption, a large portion of the transistor operation occurs within the linear region [29], permitting the driver to be modeled as a linear resistor R ON .Furthermore, the input capacitance C G1 and C G2 of the output drivers is included in the capacitance model.The wires within the clock tree segments, depicted in Figure 5, are modeled as a lumped RC impedance.A model of the section impedance is depicted in Figure 6.The input resistance of segment 1 (2), represented by R 1 (R 2 ) shown in Figure 6, is composed of the wire resistance connected in series with the transistor.The load capacitance, represented by C 1 (C 2 ), shown in Figure 6, is composed of the wire capacitance connected in parallel with the input gate capacitance.The skew at the output of the section, shown in Figure 6b, is caused by the skew T between the inputs Clk In1 and Clk In2 of the section plus the difference between the propagation delays τ 1 and τ 2 between Clk In1 and Clk Out1 , and Clk In2 and Clk Out2 , respectively (due to different RC loads).Assuming V OUT = ½V DD [14] The energy consumed by two clock tree segments forming a section without a crosslink, shown in Figure 6b, is An ideal crosslink matches the propagation delay from the source of the clock tree to the crosslink connected segments, minimizing the skew between these segments.Inserting a crosslink between two non-zero skew segments may, however, affect the skew between the remaining clock tree segments [21].Alternatively, zero skew between segments with a crosslink can be effectively maintained by inserting a crosslink between the zero skew segments, ensuring the skews remain unchanged between all of the clock tree segments with and without a crosslink.
A heuristic for inserting crosslinks should therefore be employed in a balanced clock tree: to preserve the useful skews within a balanced clock tree, the crosslinks between the zero skew segments need to be considered.These crosslinks would mitigate post-design skew variations, while producing similar propagation delays to the crosslink connected segments and, therefore, similar time constants,

(
) ( ) A crosslink X can be modeled as a lumped RC impedance, exhibiting a non-zero resistance R X and capacitance C X , thereby dissipating dynamic power to charge the crosslink capacitance.Additional power is further dissipated by the short-circuit current I SC through the crosslink when the inputs are at different polarities (e.g., Clk In1 = 0 and Clk In2 = 1), as illustrated by the dotted line shown in Figure 7.The total current flowing through R 2 , shown in Figure 7, is composed of two currents, one charging the capacitors ½C X + C 1 and ½C X + C 2 , and the other current connected to ground through R 1 .The short-circuit current with a crosslink increases with lower crosslink resistance R X .As long as the inputs are skewed in time, as shown in Figure 8, the voltage at the output is lower and the transistor (represented by R 1 ) dissipates short-circuit energy.Crosslinks with high resistance R X between low skew segments should therefore be inserted to lower the power dissipation.The current through R 1 for a step input and slow input ramp, and for different values of R X is illustrated in Figure 8.At the open-circuit limit (R X → ∞), however, the crosslink does not balance the delay to the connected segments, yet dissipates dynamic power.A circuit model of a simplified network with a crosslink, shown in Figure 6a, is presented in Figure 9 for t ≤ T and t > T. Waveforms of the voltage at the output of the clock tree section with and without a crosslink are illustrated in Figure 10, exhibiting a significant reduction in skew with a crosslink.The total energy consumption once the first input (Clk In2 ) switches and until the output capacitors are charged, based on the circuit models depicted in Figure 9, is derived in Appendix A and is The first term in (3) describes the short-circuit energy SH X E , which increases linearly with T. The derivative of the second term, which is the dynamic energy DYN X E to charge the output capacitance, is negative, yielding the maximum dynamic power consumption at T = 0 and the upper bound of the total energy, , The exponential terms in (3) range between [0,1], exhibiting the lower energy bound , Note that not all of this dynamic energy consumed during t ≤ T is useful; the total current (shown in Figure 11) comprises the current that charges the output capacitors (the solid arrow in the figure), the current that discharges the output capacitors (the dashed arrows), and the short-circuit current (the crossed arrows).The short-circuit energy SH X E increases as R X is reduced, while the dynamic energy DYN X E increases with increasing C X (decreasing R X ).The derivative ∂E/∂R X , therefore, is negative, exhibiting lower energy for higher R X .
Similar to T X = T•2 -2R/Rx [15], where the expression for the skew with a crosslink assumes ( ) where

Guidelines for Crosslink Insertion in a Clock Distribution Network
To design an efficient crosslink-based network, decisions should be determined regarding the crosslinks; (1) which pairs of clock tree segments should be connected by a crosslink, (2) where within each pair of segments should the crosslink be placed, and (3) the physical characteristics of the crosslinks.Guiding principles for crosslink insertion are provided in this section based on the analytic expressions described in Section 2.4.1.

Rule 1: Location of Crosslinks within a Clock Tree
The first design issue is determining which segments to insert a crosslink to reduce skew variations between sequentially-adjacent registers, while preserving useful skew in balanced clock trees.Any two clock tree segments located upstream to a pair of sequentially-adjacent registers, Reg 1 and Reg 2 , can be connected with a crosslink to mitigate skew variations between the sequentially-adjacent sinks, as depicted in Figure 12.Inserting a crosslink between two segments lowers the delay variations within the clock signal paths in the upper levels (the dashed lines, shown in Figure 12), and, as a result, reduces skew variations between the registers (the shaded nodes at the sink level, shown in Figure 12).Segments connected with a crosslink at the upper clock tree levels affect the clock delay to all of the downstream registers, mitigating skew variations within a larger group of sequentially-adjacent registers, as illustrated in Figure 12a.Alternatively, lower skew variations at the sinks are observed in those segments with crosslinks connected close to the sinks (see Figure 12b).However, by applying the heuristic for crosslink insertion (see Section 2.4.1),only zero skew segments should be connected to preserve the skew between sequentially-adjacent registers.Thus, to minimize skew variations while preserving useful skews, crosslinks should be inserted close to the sinks between zero skew segments with expected skew variations greater than the allowed skew variation threshold T TH .

Rule 2: Location of Crosslink within a Clock Tree Section
The second design issue is determining the location of the crosslink between two zero skew segments.Skew variations between two zero skew nodes can be regulated by inserting a crosslink between the nodes.Thus, the primary objective in choosing the specific location of the crosslink within a clock tree section is to lower the total energy consumption.The additional energy from the crosslink is the sum of the dynamic energy due to the added wire capacitance and the short-circuit energy dissipated between the crosslink-connected segments.The additional dynamic energy is not significantly affected by the specific crosslink location within the clock tree segment.Alternatively, inserting a crosslink far from the input driver of a section increases the short-circuit path resistance, decreasing the total energy consumption.

Rule 3: Crosslink Parameters
The third design issue is the type of crosslink to place between segments.Given a crosslink X of specific length l and resistivity ρ, an increase in either the width w or thickness t results in a higher capacitance C X and lower resistance R X (see Appendix B).A higher R X and lower C X should therefore be used to reduce both the short-circuit and total power consumption.Thus, crosslinks with a smaller width and thickness, and therefore higher resistance, should be inserted in low power circuits.Alternatively, a lower R X and therefore a higher C X should be used to reduce skew at the expense of higher power.The crosslink characteristics for efficient crosslink-based networks are described quantitatively in Section 3 under specific skew and power constraints.

Metrics for Power Efficient Clock Networks: Crosslink vs. Mesh-Based Topologies
Low wirelength utilization, the availability of efficient techniques for locally controlling skew, and lower power are important advantages of tree-based clock distribution networks as compared to non-tree topologies.The reliability of clock trees in high performance variation-sensitive circuits is however reduced.Thus, in moderate and low performance circuits with aggressive power and area constraints, clock trees are preferable.Alternatively, in high performance circuits, non-tree topologies are preferred.
Non-tree clock networks are shown here to be an efficient alternative to a tree topology for coping with skew variations within clock distribution trees.Two zero skew segments upstream from a sequentially-adjacent variation sensitive pair of registers should be connected with a crosslink to mitigate these variations.Thus, to attenuate predicted skew variations between N pairs of registers, at most N pairs of zero skew segments should be connected by N crosslinks, as shown in Figure 13.At the limit, for large values of N, a crosslink-based topology utilizes longer wirelength as compared to a mesh, dissipating higher power, as illustrated in Figure 14.Alternatively, mitigation of skew variations at lower power can be achieved by inserting crosslinks in those circuits with fewer sequentially-adjacent registers (smaller N) (see Figure 14).A comparison between crosslink and mesh-based topologies is discussed in this section. .Thus, the energy budget available for adding crosslinks should not exceed Mesh E .Skew and skew variations between connected segments are reduced with smaller R X and, therefore, with increasing C X (see ( 6)), thereby dissipating more power (see ( 3)).To minimize the power dissipated by low power clock networks, crosslinks with the largest possible R X and smallest C X should be used under the zero skew T X ≤ T TH constraint, yielding, based on (6), Given a crosslink X of specific length l and resistivity ρ, the width w and thickness t are the only factors that affect the crosslink resistance R X .The constraint should therefore be considered.Applying the Lagrange multipliers method for determining the constrained minima of closed-form formulae [28], the minimum crosslink capacitance should be used in low power clock networks, while satisfying the zero skew T X ≤ T TH constraint, as described by ( 8)- (14).Up to hundreds of micrometers crosslinks are routed in the lower metal layers.The capacitance of these crosslinks is determined from local and intermediate interconnect models, (9), that consider wire coupling between the upper and lower metal layers [28].Alternatively, crosslinks that connect distant segments (thousands of micrometers and longer) should be routed on the top metal layer, and modeled as a global interconnect, (11), that only couples with the lower metal layer [28].

Local and intermediate interconnect (
) where w OPT and t OPT are the crosslink width and thickness, respectively, that exhibit minimum power under the specific timing constraint.

X T H X T H X T H X T H X T H X T H T T T T T T T T X OPT X OPT X OPT X OPT OPT OPT T T T T X OPT
The variables α, β, γ, and δ vary with technology-dependent parameters, such as the interconnect resistivity ρ, horizontal spacing s, and vertical spacing h, as described in Appendix B. The upper bound on the minimum total energy in a clock tree section with a crosslink under the T X ≤ T TH zero skew constraint is determined by substituting , The upper bound on the additional energy E X,MAX from inserting a crosslink is determined by subtracting the energy consumed by a clock tree section without a crosslink Finally, the total additional energy from inserting a crosslink within a clock tree with N crosslinks, is compared with the additional mesh energy E Mesh .
The expression in ( 17) can be further simplified for A crosslink-based topology should therefore be used to provide low power while mitigating skew variations when E X,MAX < E Mesh .Otherwise, a mesh-based clock distribution network is preferable.

Simulation Results
Several examples of moderate and large skew variations that effectively exploit non-tree topologies are described in this section for a zero skew clock tree and a clock tree with certain useful non-zero skew constraints.Different mesh-and crosslink-based topologies are considered.The decision regarding a preferable non-tree topology is based on the energy efficiency metric (E X,MAX < E Mesh ) and is corroborated with SPICE simulations.Crosslink-based networks have been designed based on the analytic expressions for the optimal crosslink parameters, ( 8)- (11), and validated with simulations.A portion of a clock tree with four levels of buffers and 16 sequentially-adjacent registers in a 180 nm CMOS technology is considered.The source of the clock distribution network is driven by a 1 GHz clock signal.Transistor and interconnect parameters from [28] are used to model the drivers and wires within the clock network.The wires at the top most and lowest clock tree levels are modeled, respectively, by the global and local interconnect parameters [28].The interconnect parameters for the intermediate layers [28] are used to model the clock lines within the second and third clock tree levels.The threshold for the allowed skew variations is set to 5% of the clock period (T TH = 5%•T P ).The transistor and wire widths within the clock tree are varied between 20% to 50% of the nominal value.As a result, skew variations as high as 10% of the clock period (T P ) are observed at the registers, exceeding the 5% threshold, T TH .To mitigate skew variations between sequentially-adjacent registers, crosslink and mesh-based solutions are compared.The crosslinks are inserted according to the guidelines provided in Sections 2.4.1 and 3.Both intermediate-and sink-level sparse and dense meshes [8] are used in the zero skew clock tree.For the clock tree with a specific useful skew, all of the meshes and crosslinks are restricted to the upper clock tree levels to maintain the non-zero skew between the specific registers.To determine the preferred non-tree solution for the example networks, the power efficiency of the proposed methods is evaluated based on (18).
For the zero skew clock tree, the largest skew, number of skew violations between sequentially-adjacent registers, and additional energy due to the inserted crosslinks or mesh connections are listed in Tables 1 and 2 for, respectively, moderate (up to 20%) and large (up to 50%) skew variations.
Analogous results for the non-zero skew clock tree are listed in Tables 3 and 4, respectively, for moderate and large skew variations.In each example, locally and globally routed crosslinks are considered, respectively, for close and distant crosslink connected segments.Both uniform sparse and dense meshes are considered.Typical mesh parameters are based on [8].For crosslink-based topologies, the crosslink parameters are based on ( 8)- (11), exhibiting skew variations slightly below the allowed threshold T TH , while satisfying the zero skew constraint between the crosslink connected nodes.High correlation is observed between the analytic expressions and the simulation results.Based on SPICE simulations for the case of moderate skew variations (Tables 1 and 3), skew mitigation with crosslinks and with a mesh is similar.However, higher power is consumed by the mesh-based clock distribution network.Alternatively, in clock trees with larger skew variations (Tables 2 and 4), the target skew cannot always be achieved with an intermediate-or sink-level sparse mesh (Table 2).Thus, a dense mesh is used at the expense of higher power.Specifically, in the case of the zero skew clock tree (Table 2), the crosslink-based solution is preferred due to the lower power dissipated by the crosslinks as compared to the dense sink-level mesh (compliant with E X,MAX < E MESH ).Alternatively, for the clock tree with non-zero skew constraints (Table 4), the maximum skew with crosslinks exceeds the required 50 ps threshold.Hence, an intermediate-level mesh is preferable.
An analytic estimate of the energy is also listed in Tables 1-4, which is used to determine the preferred clock topology for the specific examples.In Tables 1-3, the upper bound for the energy consumed by the additional crosslinks E X,MAX is lower than E MESH , demonstrating that a crosslink-based solution is preferable in these specific cases.In Table 4, however, the skew requirements cannot be satisfied with the proposed crosslinks-based solution.Additional crosslinks would increase the dissipated power so that eventually E X,MAX would be greater than E MESH .The decision regarding the choice of topology based on the energy efficiency metric is thereby confirmed by SPICE simulations in these examples.
Note that a more efficient solution may be achieved by either a mesh or a crosslink-based topology in certain clock networks, as shown in the aforementioned examples.The purpose of this work, as demonstrated by the example networks, is to provide metrics for determining the more efficient non-tree method to mitigate skew variations rather than suggest a general topology for any clock network.

Summary
Different topologies and techniques to design power efficient clock distribution networks at several operating frequencies are reviewed in this paper.For low power circuits that operate at moderate and low frequencies, a buffered clock tree may be the preferable method.To satisfy a specific set of timing constraints, a balanced, low power clock tree can be efficiently designed using existing techniques.Existing skew solutions in tree-based networks, however, are not efficient in mitigating manufacturing induced variations.Thus, in modern circuits with aggressive timing requirements, non-tree topologies should be considered to cope with skew variations.Mesh-based solutions have recently been shown to reliably mitigate skew variations through the use of a symmetric mesh structure, albeit at significantly higher power.Alternatively, mesh redundancy can be avoided in crosslink-based topologies to mitigate skew variations at lower power.
Guidelines for crosslink insertion in a balanced clock tree are presented in this paper.To maintain a target skew between sequentially-adjacent registers, a heuristic is proposed for inserting crosslinks within a balanced clock tree between upstream zero skew segments to those sequentially-adjacent registers that violate timing constraints.In addition, the crosslink should be inserted as far as possible from the section drivers for enhanced tolerance to variations at lower power.The optimum crosslink parameters under zero skew constraints are also presented.Tradeoffs between energy consumption and skew variations in crosslink-based topologies are investigated in this paper based on analytic expressions and simulations, demonstrating that crosslinks with lower resistance should be used to enhance the tolerance of a circuit to manufacturing induced variations; whereas crosslinks with high resistance and therefore low capacitance should be used in low power clock networks.Analytic expressions are also described to determine the most power efficient clock network topology under specific timing constraints.Simulation results are presented, confirming the conclusions of the theoretical analysis regarding the choice of topology for low power, variation-tolerant clock networks.

Appendix A: Total Energy Consumed in a Clock Tree Section with a Crosslink
The voltage at the output of a clock tree section and energy expressions for t > 0 are derived in this section based on the circuit model shown in Figure 9.The circuit model of a simplified clock tree section with a crosslink is shown in Figure 9 for 0 < t ≤ T with each input at a different polarity (Clk In1 ≠ Clk In2 ) and for t > T with identical inputs (Clk In1 = Clk In2 ).
Differential equations are determined from Figure 9a, for Clk In1 ≠ Clk In2 , t ≤ T, with initial conditions, ( ) ( ) where . The total energy consumed during the time interval [0,T] is To determine the additional energy consumed for t > T, two differential equations from Figure 9b for with initial conditions, The solution of (A.6)-(A.7)for Clk In1 = Clk In2 , t > T with these initial conditions is ( ) ( )( ) where The energy consumed for t > T is dynamic energy and converges to 2 2 The total energy consumption once the first input Clk In2 switches until the output capacitors are fully charged is

Appendix B: Crosslink Parameters for Low Power Design under the Zero Skew Constraint
The optimum crosslink resistance under the zero skew T X ≤ T TH constraint is described in Section 3, permitting the optimum crosslink capacitance to be determined based on the wire capacitance.Capacitive coupling should however be considered to produce an accurate wire model, complicating the analytic expressions for the total wire capacitance.The optimum crosslink capacitance is derived in this section for minimum power under the T X ≤ T TH constraint and for a specific crosslink resistance.
Skew variations between crosslink connected segments decrease with lower crosslink resistance R X = ρl/wt, where l, w, and t are the wire length, width, and thickness, respectively, as illustrated in Figure B1.However, when mitigating variations between zero skew segments, the zero skew constraint T X ≤ T TH should be enforced with a crosslink, yielding the optimum crosslink resistance , X TH T T X OPT R ≤ that satisfies T X ≤ T TH at the minimum power, as shown in Figure B2.Note that w and t range between [w min ,M/t min ] and [t min ,M/w min ], respectively, as illustrated in Figure B4, where w min and t min are determined from the minimum geometric feature size.Thus, based on the Weierstrass extreme value theorem [30], the crosslink capacitance C X in the closed and bounded interval [(w min , t min ),(M/t min , M/w min )] must produce the minimum value within that interval.Based on technology parameters from [28] and the interconnect geometry (see Figure B1), the crosslink capacitance is C X1 = 2C g1 + 2C C1 for the local and intermediate layers and C X2 = C g2 + 2C C2 for the global interconnect, where are, respectively, the coupling capacitance for the local and intermediate (C C1 ), and global (C C2 ) interconnect.Thus, given an interconnect length l, spacing s, and distance to the ground h, the crosslink capacitance as a function of the width w and thickness t is Note that the derivatives ∂C X /∂w and ∂C X /∂t are always positive.Therefore, the crosslink capacitance C X increases with wider and thicker crosslinks.Furthermore, the optimum crosslink capacitance , X TH T T X OPT C ≤ can be derived based on the Lagrange method for determining the minimum f(w, t) under the constraint, g(w, t) = M.To optimize y = f(w, t) subject to M = g(w, t), the auxiliary function The partial derivative of L is determined with respect to each of the variables, assuming w ≈ s and t = η•h, and set to zero, yielding , 0 , 0 , , and the stationary point (w STAT , t STAT ),such that the gradient of C X (w STAT , t STAT ) equals zero, is

T T T T X MAX X OPT T T T T T T T T X OPT X OPT X OPT
where α, β, γ, and δ are based on the technology dependent parameters, such as the interconnect resistivity ρ, horizontal spacing s, and vertical spacing h [28]

Figure 1 .
Figure 1.Power vs. clock skew variations for different clock network topologies.

Figure 2 .
Figure 2. Clock tree composed of the source, trunk, segments, and sinks.

Figure 4 .
Figure 4.An example of the excessive wirelength and power of a mesh as compared to a crosslink-based topology.

Figure 5 .
Figure 5. Two clock tree segments (a) with an impedance model of the crosslink and (b) without a crosslink.

Figure 6 .
Figure 6.Two clock tree segments with impedance model (a) with a crosslink and (b) without a crosslink.

Figure 7 .
Figure 7. Two clock tree segments connected with a crosslink.The dotted line illustrates the short-circuit current path for Clk In1 = "0" and Clk In2 = "1".

Figure 8 .
Figure 8.Current through R 1 for (a) step input and (b) slow ramp input.The negative currents prior to T = 500 ps degrade the performance.

Figure 12 .
Figure 12.Mitigation of skew variations between Reg 1 and Reg 2 with a crosslink.A crosslink should be inserted at (a) the upper clock tree level to reduce variations within a larger group of four registers or (b) closer to the sinks to effectively cancel variations between Reg 1 and Reg 2 .

Figure 14 .
Figure 14.Power efficient non-tree topologies to mitigate skew variations (a) between four sequentially-adjacent registers with a crosslink, (b) within a large group of sequentially-adjacent registers with a mesh, as opposed to (c) an inefficient crosslink-based clock network.

Figure B2 .RC
Figure B2.Optimum crosslink resistance , X TH T T X OPT R ≤ under the T X ≤ T TH constraint.

Figure B3 .C
Figure B3.Optimum crosslink capacitance , X TH T T X OPT C ≤ under the T X ≤ T TH constraint for w•t = M.

Figure B4 .
Figure B4.Bounds of the crosslink width and thickness based on the minimum feature size and w•t = M constraint.

Table 1 .
Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a zero skew clock tree.

Table 2 .
Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a zero skew clock tree.

Table 3 .
Comparison of different non-tree approaches to mitigate moderate (up to 20%) skew variations within a clock tree with a useful skew schedule.
MESH ) N/A

Table 4 .
Comparison of different non-tree approaches to mitigate large (up to 50%) skew variations within a clock tree with a useful skew schedule.
3) are, respectively, the area and fringe capacitance to the underlying plane for the local and intermediate (C g1 ), and global (C g2 ) layers and 19gher (lower) values of η within the ranges of (B.19) and (B.20) should be used for thicker (thinner) wires.∂L/∂w = 0, ∂L/∂t = 0, and ∂L/∂λ = 0 are solved based on (B.1) and (B.10)-(B.12),producing , as described by (B.13)-(B.20).If the stationary point (w STAT , t STAT ) ranges within the interval [w min ,M/t min ] and [M/t min ,t min ], and C X (w STAT , t STAT ) is the minimum value of the crosslink capacitance within that interval, (w OPT , t OPT ) = (w STAT , t STAT ), otherwise ,Finally, w OPT and t OPT are substituted into C X to determine the optimum capacitance , ≤ under the constraint T X ≤ T TH for a crosslink of specific resistance ,