Low-Process–Voltage–Temperature-Sensitivity Multi-Stage Timing Monitor for System-on-Chip Applications

: High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µ s, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the speciﬁed signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efﬁcient and accurate in SoC design.


Introduction
With an increased demand for electronics products nowadays, the complexity of chip design in high-end electronics systems has also increased. To integrate many circuits with different functions into a single chip, the system-on-chip (SoC) implementation method is widely used in modern chip design. Currently, the improvement of semiconductors has significantly advanced the performance of chips. Thus, the operating clock frequency of systems can reach the gigahertz level. However, systems functioning at such high performance with complex SoC design can encounter several challenges. One such challenge is the delay uncertainties, such as propagation delay mismatching, clock jitter and clock skew, that degrade overall system performance and increase design efforts to meet the timing constraints. Because a large number of circuits with different functions are integrated into the SoC, these problems become significantly more serious. Additionally, the delay of the timing-critical path in the system changes due to variation in the operating environment. However, with the increased frequency of the operating clock, the time margins of the highperformance SoC become smaller, reducing the stability and performance of the chip, and can cause malfunction [1][2][3]. Furthermore, negative bias temperature instability (NBTI), hot career injection (HCI), and electromigration (EM) will also induce serious reliability issues after shipment [4]. Consequently, to ensure the functionality and performance of high-performing systems, it is necessary to develop a technology that monitors either the delay in the critical path or timing uncertainty in the circuit [1][2][3][4][5][6][7][8].
Second, because SoC has high circuit complexity, reducing overall power consumption has been an important issue in SoC design. In addition to reducing the power consumption of individual functional blocks through low-power design, using a dynamic voltage and frequency scaling (DVFS) scheme to adjust the voltage and operating frequency of specified functional blocks is a common low-power design technique in SoC [9,10]. In the DVFS technique, the main role of a timing monitor is to measure the specified timing-critical path delay in a digital block and provide such delay information to the frequency/voltage controller. Based on the measured delay provided by the timing monitor, the DVFS controller can adjust the operating frequency and supply voltage to reduce power consumption. For example, if the delay of the specified timing-critical path is larger than the system requirement, the system can decrease the operating voltage to reduce the power consumption. Conversely, if the delay of the specified timing-critical path is smaller than the system requirement, the system will increase the operating voltage to ensure the functionality of system. In the DVFS system, it is widely used a critical path replica circuit to track the delay of timing-critical path at different supply voltage [9][10][11]. Therefore, the timing monitor usually measures the delay of the critical path replica circuit. Since providing accurate timing information is key to whether DVFS can effectively reduce SoC power consumption while maintaining the functional operation, designing related monitoring circuits is crucial for SoC design.
To overcome the two aforementioned SoC design challenges, the chip should be able to monitor the critical timing. Figure 1 illustrates the role of the timing monitor in SoC. The timing-monitor monitors the timing status of specific signals in the chip, observes situations of uncertain timing, and returns the monitoring results to the control circuit of the system, which in turn prompts corresponding responses to reduce or remove the impact of timing uncertainty on the functionality of the system. For example, if the system clock skew occurs when the clock propagation delay mismatching, the timing-monitor measures and sends the phase difference between the two clock signals to the de-skew circuit to adjust the clock timing and reduce the impact of mismatched transmission delay, as shown in Figure 1a. The timing monitor is also employed in the DVFS scheme. Through the delay measurement of the critical path replica circuit, it provides the information required to adjust the voltage/frequency of the specified block to achieve the DVFS, as shown in Figure 1b. Therefore, the timing monitor is an indispensable circuit in high-performance SoC design. Different approaches have been developed to implement a timing monitor. The simplest and most common design is the use of the delay of delay element as the monitoring resolution. The start signal propagates in the successive delay elements and the stop signal registers the state of the delay elements, which reveals the number of elements between the start and stop [12]. This design concept is very straightforward; however, its quantization resolution is limited by the delay element that is not sufficient for advance system applications. In order to improve the monitoring resolution, the Vernier delay line (VDL) structure has been proposed to achieve high delay resolution. However, such circuits have large hardware costs, and their delay resolutions are sensitive to supplyvoltage variations [13]. To improve monitoring resolution and range, the structure that combines a pseudo-differential ring oscillator and a counter has been proposed [14,15]. This design can provide better monitoring resolution and range; however, the monitoring resolution is sensitive to PVT variation. Ref. [5] uses a cascading structure to improve the monitoring resolution, and takes two delay lines to lower the sensitive to supply voltage variations by choosing the suitable the width and length of each metal-oxide-semiconductor (MOS) in two delay lines. However, it is not only hard to obtain the suitable the width and length of MOS, but also hard to lower the sensitive to process and temperature variations.
Most SoC applications require several design considerations for time monitors, including measurement resolution, range, and measurement response time. The measurement resolution and range determine the accuracy and applicable range of the timing measurement results, respectively. Due to differences in the required timing monitor in the system, designing a high measurement resolution and wide range simultaneously is essential. Additionally, since DVFS applications need current timing status, the measurement response rate of the timing monitor is an important basis for judging whether the timing monitor is suitable for SoC applications. Furthermore, if the output results of the timing monitor are affected by process-voltage-temperature (PVT) variations, the reliability and stability of the timing monitor will be degraded. Therefore, this paper proposes a timing monitor that achieves high measurement resolution and wide measurement range, and low PVT variation sensitivity.
This paper is organized as follows: Section 2 describes the architecture and operating principle of the proposed timing monitor. Section 3 describes the detailed circuit of each block in the timing monitor. Section 4 presents chip implementation and experimental results. Finally, Section 5 is the conclusion.

Timing-Monitor Architecture
To achieve a high time-monitoring resolution and range simultaneously, a multistage timing-monitor architecture is proposed. Figure 2 shows the proposed timingmonitor architecture, including a code calculator and three measurement stages, namely, a counter stage, an all-digital delay-locked loop (ADDLL) stage, and an interpolation stage. These three measurement stages each have different measurement resolutions and ranges, and the overall timing-monitoring range can be expanded and resolution effectively improved when properly connected. The counter stage has the widest time-monitoring range, followed by the ADDLL stage and lastly, the interpolation stage. Conversely, the interpolation stage has the highest time-monitoring resolution, followed by the ADDLL stage and then the counter stage. The concept of a multi-stage timing monitor is speeding the conversion rate by parallel sampling processing while considering the monitoring range and resolution.  Since the time interval to be measured is not the same as the time interval measured by the counter stage, the part (T DA + T IA ), not measured by the counter stage must be filled and the excesses (T DB + T IB ) deducted. Thus, the time interval to be measured can be expressed as The counter stage measures T C and converts it to output digital code CodeC. T DA and T DB are measured using the ADDLL stage and obtaining the digital codes CodeDA and CodeDB, respectively. T IA and T IB are measured by the interpolation stage and obtaining the digital codes CodeIA and CodeIB, respectively. When all three stages are measured completely, the output codes of each stage are sent to the output code calculator for integrated calculation, and the final output code can be obtained. ADDLL selects two adjacent MCLKs (MA1 and MA2) of the positive edge of Signal A to generate CodeDA. These signals are sent to the interpolation stage for more accurate measurement. Similar to Signal A, ADDLL also selects two adjacent MCLKs (MB1 and MB2) of the positive edge of Signal B to generate CodeDB, and these signals are sent to the interpolation stage for more accurate measurement. However, since the positive edges of Signal A and B only appear one time during the measurement, the interpolation stage cannot measure in time. Therefore, the ADDLL stage retains the time relationship between Signal A, MA1, and MA2 through delay to ensure the correctness of the input signal of the interpolation stage. Signal A_Dly, MA1_Dly, and MA2_Dly are the delay signals corresponding to Signal A, MA1, and MA2, respectively. Similar to Signal B, the ADDLL stage also generates Signal B_Dly, MB1_Dly, and MB2_Dly through an appropriate delay as input signals for the interpolation stage.
Because the first stage uses a counter for measurement, it achieves a wider measurement range by increasing the number of counter bits. Compared with counter and ADDLL stages, the interpolation stage has a higher measurement resolution, and since the measurement resolution of the overall timing monitor is determined by the interpolation stage, the measurement resolution of the timing monitor improves significantly. Therefore, the proposed multi-stage timing monitor achieves high measurement resolution and a wide measurement range concurrently.
The MCLKs are generated using the digitally-controlled delay line (DCDL) in the ADDLL, and the time interval between two adjacent MCLKs equals the delay of the delay element (DE) in the DCDL. The overall delay of DCDL equals T ref when ADDLL is locked. If there are M DEs in the DCDL, the delay of each DE, which is the measurement resolution of the ADDLL stage, is 1/M of T ref . So long the reference clock cycle remains stable, the delay of the DE does not change due to PVT variations, thus reducing the sensitivity of measurement resolution of ADDLL stage to the environmental variation and its measurement resolution keeps 1/M of the counter stage. Furthermore, if the delay of DE is equally divided into N parts, the measurement resolution of the interpolation stage is one-Nth of the ADDLL stage. Therefore, the overall timing measurement is expressed as ∆T I is the measurement resolution of the interpolation stage. Added to high measurement resolution and wide measurement range, the measurement resolution of each stage of the proposed architecture is unaffected by PVT variations and can maintain a fixed proportional relationship, effectively reducing the sensitivity of time-monitoring results to environmental variations. The detailed circuit of each stage will be described in the following sections.

ADDLL Stage
The ADDLL consists of a phase detector (PD), a code controller, and a DCDL, as shown in Figure 4a. PD receives two clock signals of the reference clock (Ref_CLK) and DCDL output (DCDL_CLK) and generates the input signal of the code controller (UP and DN) from the phase relationship between these two clock signals. The code controller adjusts the DCDL control code (DCDL_Code) from UP and DN and then changes the delay of DCDL by changing the DCDL control code. By aligning the positive edges of the Ref_CLK and DCDL_CLK, ADDLL is locked, meaning that the delay of DCDL equals T ref .
The code controller uses binary search to control the code-locking process [16]. Figure 4b illustrates the binary search-locking procedure. First, the DCDL control code is set to an intermediate value. When UP or DN is at a high level, the DCDL control code increases or decreases by a specific amount of change. If the output of the PD changes from UP to DN or vice versa, the change of DCDL control code reduces to halve. Finally, when the change is reduced to one, the binary search process is completed, and ADDLL also completes the locking procedure.  Figure 5 shows the circuit block diagram of DCDL containing nine DEs. Each proposed cascading-stage DE consists of three delay stages, namely, first, second, and the third digitally-controlled delay cell (DCDC). These three DCDC have different controllable delay range and resolution. The first has the widest controllable delay range, followed by the second and then the third DCDC. Conversely, the third has the finest controllable delay resolution, followed by the second and then the first. The overall controllable delay range and resolution of cascading-stage DE is a function of the first and third DCDC, respectively. Thus, the advantage of cascading-stage structure DE is that can provide a wide controllable delay range and fine controllable delay resolution. The first DCDC consists of nine delay buffers with output connected to a tri-state buffer. It selects one of nine signal propagation paths through the one-hot encoding control signals (C1) to generate different delays. Also, its controllable delay range can be enlarged easily by increasing the number of delay buffers. However, the controllable delay resolution of the first DCDC only equals the delay of one delay buffer. Therefore, the second DCDC improves the overall controllable delay resolution of DE. It consists of three cross-coupled circuits composed of an inverter and a tri-state inverter. Given the enabled signal of the tri-state inverter at a high level, a current opposite to the direction of the signal propagation is generated instantaneously as the signal transition, which changes the delay of the second DCDC. Therefore, the control signal of the second DCDC (C2) changes the delay of DE. To further improve the overall resolution of DE, the third DCDC, composed of 6 two-input NAND gates, is added to DE. The gate capacitance of the two-input NAND gate slightly changes through different logical levels of control code (C3), thereby changing the delay of DE.
For these three DCDCs to have a better connection and to ensure that the finest controllable delay resolution of DE is the same as the controllable delay resolution of the third DCDC, the controllable delay range of each DCDC should be greater than the controllable delay resolution of the previous DCDC [17] (Table 1).  Figure 6 shows the circuit diagram of the complete ADDLL stage, including an ADDLL, nine phase comparators, ten MCLK selectors, and two delay units. The phase comparators provides the selection signals required by the MCLK comparator.   The circuit diagram of the MCLK selector is shown in Figure 7a. To ensure the correct MCLK selection signal appears earlier than MCLKs, MCLKs generates a delayed version signal through a delay unit. The MCLK selector receives the MCLK selection signal and decides which delayed version of MCLK is sent to the interpolation stage. Signal A and Signal B also pass through the same delay unit, maintaining the relative timing relationship with the selected MCLKs. Here, an example that explains the operation of the phase comparator and MCLK selector is used. If the positive edge of Signal A lies between the positive edges of MCLK [2] and MCLK [3], then QA [2] is the high level and QA [3] is the low level. Consequently, the output of the XOR gate (XA [2]) changes to a high level, and the MCLK selection signal (SA [2]) changes to a high level. Thus, the delayed version of MCLK [2] and MCLK [3] are MA1_dly and MA2_dly, respectively.

Interpolation Stage
The interpolation stage mainly measures the time interval between selected MCLKs and Signal A/Signal B using multiple sampling signals between adjacent MCLKs to improve measurement accuracy. Delay interpolation makes the time interval between multiple sampling signals equal [16]. Figure

Code Calculator
When the three measurement stages complete the measurements, the results are sent to the code calculator for integrated calculation following Equation (2)

Experimental Results and Discussion
The proposed timing monitor is designed and implemented through the mixed-signal design flow, and fabricated by TSMC 0.18 µm 1P6M CMOS standard process with a core area of 685 µm × 650 µm. Figure 9 shows the microphotographs of the chip. The postlayout simulation results of the proposed timing monitor verify the relationship between the input time interval and output digital code. The range and resolution of the proposed timing monitor are 2.2 µs and 47 ps, respectively. The power consumption of the timing monitor is 7.58 mW when the reference clock signal frequency is 120 MHz and the operating voltage is 1.8 V. To verify the impact on the output results under different PVT conditions, Figure 10 shows the timing measurement errors under three operation conditions. The process corner, supply voltage, and operating temperature of the best condition are Fast/Fast, 1.98 V, and −40 • C, respectively. Those for the typical condition are Typical/Typical, 1.8 V, and 25 • C, respectively and for the worst condition, we have Slow/Slow, 1.62 V, and 125 • C, respectively. The maximum monitoring error of the best, typical, and worst conditions is 0.02%, 0.02%, and 0.06%, respectively. Since there is the phase error between reference clock and DCDL output when ADDLL is locked, the monitoring resolution of the ADDLL stage is not equal to 1/M of T ref . Additionally, the outputs of interpolation stage may be not stable due to the charge and discharge current mismatching of interpolation circuit These errors cause the overall timing monitoring error is not the same with different input time interval.  Table 2 provides the performance comparisons with the state-of-the-art timing-monitor design. From Table 2, the proposed timing monitor has the best timing-monitoring resolution compared to other designs, and therefore provides more accurate timing measurement results for SoC applications. Additionally, it provides various time monitoring and can be widely used in various time monitoring applications. If the system requires a wider time-monitoring range, it needs only increase the output bit of the counter to meet the requirements of the system. Furthermore, compared with previous designs, it has a lower sensitivity to PVT variations and greatly improves the stability of the output. In sum, the proposed timing monitor not only can provide a finer timing-monitoring resolution and a wider timing-monitoring range but also achieve a lower PVT-variations sensitivity, thus it is suitable for SoC applications.

Conclusions
This paper presents a structured multi-stage timing monitor for SoC applications comprising three time monitor stages to achieve wide-range, high-resolution time monitoring, obtain accurate time-monitoring results, and more easily meet system requirements for time monitoring. Additionally, due to the low PVT sensitivity of the proposed design, it has high stability in SoC with high complexity with variable operating environments. Therefore, the proposed timing monitor not only effectively reduces the impact of uncertain timing on the system but also makes the operation of the DVFS scheme more efficient and accurate in high performance and complex SoC design.