Two-Stage Clock-Free Time-to-Digital Converter Based on Vernier and Tapped Delay Lines in FPGA Device

: This article presents an idea, design and test results of a new time-to-digital converter (TDC) implemented in an FPGA device. The high resolution of 13 ps and measurement range of 3.4 ns are achieved based on a two-stage time interpolation (TI). In the ﬁrst and second stages of the TI we have used the Vernier delay line and a single tapped delay line, respectively. This solution provides respectable metrological parameters without the need to use a clock signal, and signiﬁcantly saves the logical resources of an integrated circuit (IC). The proposed method, generally based on two different variants of the discrete delay line, is easy to design and implement in digital ICs. For experimental veriﬁcation, the TDC was implemented in a single programmable device from family Virtex-7 (Xilinx).


Introduction
High precision time-to-digital converters are crucial in various applications, such as precise laser range findings [1,2], TOF (time of flight) measurements of particles in high-energy physics [3][4][5], positron emission tomography in medicine [6,7], Raman spectroscopy to study the chemical composition of materials [8], and instruments for space exploration [9,10].The constant increase in the resolution and precision of TDCs is provided due to the permanent development of both conversion methods and technological processes for their implementation.One of the most popular methods easily implementable in digital ICs is based on the use of a tapped delay line created with discrete delay elements [11].In such a TDC the resolution is equal to the propagation time of a single delay element, while the measurement range equals the total delay of the line.Therefore, to obtain a wide measurement range and a high resolution, an exceptionally long delay line has to be used, which typically leads to a significant error of conversion linearity.To alleviate the problem, typically a two-stage time interpolation method proposed by Nutt is applied [12].In this method, a wide measurement range is obtained by counting the periods of a reference clock, while a high resolution can be achieved by using a shorter delay line that provides time interpolation within a single clock period [13,14].Such a solution significantly shortens the delay line, which simultaneously improves the precision of the time-to-digital conversion.However, taking into account the limited maximum clock frequency usable with typical microelectronic technology and the short propagation times of the delay elements used to build the delay line in the latest ICs, the line length is still relatively long.For example, in a modern FPGA device, the maximum clock frequency is approximately 700 MHz and the propagation time of a single carry chain element, which is commonly used to create the delay line, is 10 ps.Thus, a line of at least 143 elements is needed to cover a single period of the clock signal (TCLK/propagation time of a delay element).This is a fairly large number and still increases as the feature size of the IC devices continues to diminish.It should also be strongly emphasized that in most applications a very wide measurement range, on the order of seconds or more, is not required [15,16] and the additional period counter unnecessarily involves the logical resources of an integrated circuit.Moreover, the counting of high-frequency clock periods necessary in the classic Nutt method can be troublesome and requires the use of additional synchronizers.Furthermore, the applied clock signal should be as stable as possible, otherwise its jitter will deteriorate the precision of the measurement.
It is worth noting that in the latest TDCs a strictly synchronous architecture is also used.Most of the modern FPGA devices incorporate high-speed communication interfaces, such as Serializer/Deserializer blocks (SerDes), which can be used for TDC implementation.For example, in a 96-channel SerDes-based TDC, authors achieved the resolution of 1.2 ns within a measurement range of 304.8 ns [17].Another article about multichannel TDC for PET imaging resolution, with the use of a similar conversion method, reports achieving a 321.5 ps resolution [18].Thus, methods based on SerDes favor the implementation of a large number of channels, however, they are characterized by an average resolution and precision [19].TDCs based on asynchronous or combined conversion methods, although more difficult to implement and duplicate, allow the achievement of a resolution and precision of the order of single picoseconds, within a very wide measurement range [20].
The constant improvement of parameters of the developed TDCs is the result of, inter alia, the continuous development of microelectronic technology used to implement the converters.Modern TDCs are typically developed in CMOS technology as ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array) devices.The first group of ICs offer great freedom of design, including the use of analog methods, and the possibility of obtaining a very high resolution and precision up to 1 ps [21], not available until recently in FPGAs.The latter ICs are more easily available and cost-efficient.The dynamic development of their logical resources has made it possible to implement converters with parameters as high as in ASICs, or sometimes even higher (e.g., with a resolution below 1 ps [22]).
In this paper we propose the two-stage, clock-free TDC based on a new combination of two delay line variants, i.e., the Vernier delay line (VDL) [11,23] in the first interpolation stage and standard tapped delay line (TDL) in the second one.This combination of conversion methods makes it possible to obtain a relatively high resolution within a reasonable measuring range, and can be implemented in a versatile and easily available FPGA device.

Method
Figure 1a presents a simplified block diagram of a TDL typically created in FPGA devices with the use of fast carry chain elements and D flip-flops.The START signal is incrementally delayed on each stage of the TDL.When the STOP signal occurs, delayed START signals are registered in flip-flops.Their outputs Q n carry data about the measured time interval (TI) in the thermometric code, which is then converted to the natural binary code.The value of the measured TI is calculated according to the formula: where q represents the resolution of TDL equal to the delay of a single buffer in the line (t b ), and m means the number of flip-flops that were set during the conversion process.The VDL block diagram is shown in Figure 1b.The START signal is delayed in the line of serially connected latches D with a single delay of td, while the STOP signal is delayed gradually by noninverting buffers with propagation times of tb < td, and with a resolution calculated according to Equation (2).When the STOP signal catches up to the START signal, it comes to coincidence that carries information about the measured TI, which fulfill the relationship (3) [23].The principle of a two-stage time interpolation method is based on the coarse time digitization in the first interpolation stage (FIS) and the fine measurement of the residual time interval in the second interpolation stage (SIS).In the proposed TDC the FIS operates according to the Vernier method, and each delay cell of the VDL shortens the measured TI gradually until coincidence of the START and STOP signals occurs (Figure 2).Then the residual TI TInSIS is transmitted to the SIS, where it is quantized by the TDL with the higher resolution qSIS << qFIS.The VDL block diagram is shown in Figure 1b.The START signal is delayed in the line of serially connected latches D with a single delay of t d , while the STOP signal is delayed gradually by noninverting buffers with propagation times of t b < t d , and with a resolution calculated according to Equation (2).When the STOP signal catches up to the START signal, it comes to coincidence that carries information about the measured TI, which fulfill the relationship (3) [23].
The principle of a two-stage time interpolation method is based on the coarse time digitization in the first interpolation stage (FIS) and the fine measurement of the residual time interval in the second interpolation stage (SIS).In the proposed TDC the FIS operates according to the Vernier method, and each delay cell of the VDL shortens the measured TI gradually until coincidence of the START and STOP signals occurs (Figure 2).Then the residual TI T InSIS is transmitted to the SIS, where it is quantized by the TDL with the higher resolution q SIS << q FIS .The value of the measured time interval is calculated as follows: where N and M represent decimal equivalents of conversion results in FIS and SIS, respectively.Since N represents measured data after coincidence, not data of T InSIS just before coincidence, that is why subtraction (N − 1) in mentioned formula is necessary.The value of the measured time interval is calculated as follows: where N and M represent decimal equivalents of conversion results in FIS and SIS, respectively.Since N represents measured data after coincidence, not data of TInSIS just before coincidence, that is why subtraction (N − 1) in mentioned formula is necessary.

Design and FPGA-Based Implementation
The block diagram of a designed two-stage TDC is shown in Figure 3.The FIS contains a VDL, a coincidence detector (CD) and a Pararell-In-Pararell-Out (PIPO) register, while the SIS includes the TDL.Both stages are connected via a multiplexer (MUX) that allows the selection and transmittal of residual TI from the FIS to the SIS.

Design and FPGA-Based Implementation
The block diagram of a designed two-stage TDC is shown in Figure 3.The FIS contains a VDL, a coincidence detector (CD) and a Pararell-In-Pararell-Out (PIPO) register, while the SIS includes the TDL.Both stages are connected via a multiplexer (MUX) that allows the selection and transmittal of residual TI from the FIS to the SIS.
The VDL is designed as two parallel chains of latches D and related Look-Up Tables (LUTs) operating as noninverting buffers.The VDL output measurement data in thermometric code are converted into natural binary code by a fast combinational decoder.Output signals of each delay cell of the VDL are examined by CD to determine in which cell the coincidence occurs (Figure 4).The idea of a coincidence detector operation is based on priority coding.A high logic state on gate output indicates coincidence of signals involved and the end of measurement in FIS.Information from the coincidence detector is used to address the multiplexer for establishing a path for the transmission of residual TI from the FIS to the SIS.
The major design problem was to find a way to efficiently delay the transmission of T InSIS to the MUX inputs.FPGAs have limited logic resources to create exact lengths of paths in an asynchronous approach, especially if those delays have to be relatively long.The idea behind a PIPO register is to delay the signals START and STOP transferred to the multiplexer until its address is established.The multiplexer is the component that connects both interpolation stages.After the coincidence signal reaches the address input, the multiplexer selects register outputs with START and STOP signals from the VDL and CD stage at which the coincidence occurred.
Delay elements in the TDL, which perform fine time quantization, are implemented in the FPGA as multiplexers of fast carry chain delay lines [24].Data transmission and code conversion are performed in the same manner as in FIS.The VDL is designed as two parallel chains of latches D and related Look-Up Tables (LUTs) operating as noninverting buffers.The VDL output measurement data in thermometric code are converted into natural binary code by a fast combinational decoder.Output signals of each delay cell of the VDL are examined by CD to determine in which cell the coincidence occurs (Figure 4).The idea of a coincidence detector operation is based on priority coding.A high logic state on gate output indicates coincidence of signals involved and the end of measurement in FIS.Information from the coincidence detector is used to address the multiplexer for establishing a path for the transmission of residual TI from the FIS to the SIS.The crucial design process during the implementation of TDC in an FPGA device was the layout design and timing analysis.Figure 5 shows the complete topographic design of the developed TDC.Both the VDL and the TDL were placed vertically in adjacent columns of embedded logic cells to obtain possibly similar net delays and minimize the nonlinearity error.Each delay cell of the VDL, which contains a single latch D and LUT, is created with the use of a single logic cell [24].Unfortunately, the STOP signal cannot be transmitted to the SIS through a global clock buffer because it introduces a too-long delay that distorts the measured TI.To avoid this problem the sophisticated design timing constraints were used, and they allow achieving a negligible signals skew.The MUX and the coincidence detector are placed parallel to the VDL.Such mutual location of those blocks provided possibly short interconnections, which was crucial in the timing of the signals transmitted from AND gates to address the input of the multiplexer.On the contrary, the PIPO register was placed in an arbitrarily long distance away from the FIS to increase the propagation times of the signals START and STOP, enough to set the multiplexer address safely before the transmitted TI appears at its data inputs.Moreover, to obtain possibly the same delays of connections and fulfill timing requirements for the multiplexer the paths segmentation [25] was necessary.
Table 1 presents a comparison of the implementation of three selected TDCs in terms of resource utilization.Only the resources utilized strictly by the TDC were considered.Numbers of Slice registers and Slice LUTs indicate resource utilization within occupied FPGA slices.On the other hand, the number of BUFG results from the number of clock signals involved in the design and the way they were distributed inside the device.The Classical Two-Stage TDC and the TDC proposed in the article were implemented in the Virtex-7 device, while the low resource TDC was implemented in the Virtex-5 device.However, it should be noted here that both Xilinx device series (Virtex-7 and Virtex-5) differ in speed due to different production technologies, but both have the same logic block architecture ASMBL (Application Specific Modular Block Architecture), so they are suitable for comparison in the context of the logical resources' occupancy.The low resource TDC, based on an oversampling method, has the lowest resource utilization, but provides a rather moderate resolution of 625 ps [26].Since its resolution depends on the clock frequency, it is technology independent.The classical two-stage interpolation method [27] provides a high resolution of 45 ps but at the expense of a very high occupancy of FPGA logical resource.This is the result of a less sophisticated floor planning than for the proposed clock-free method.The designed two-stage TDC requires symmetric and parallel floor planning, which helps to achieve evenly distributed data paths and ensures meeting timing constraints.Since the proposed new method is clock-free, it naturally saves clock resources, including drastically reducing the number of BUFGs.Classical two-stage TDC implementation utilizes seven buffers of BUFG (one for input clocks, four for multiphase clocks from MMCM, two for TDLs), which is almost 25% of BUFG resources of the Virtex-7 (Xilinx) FPGA device.Authors of the low resource TDC did not provide in their reported results the number of BUFGs used; however, the implementation of 4xOversampling method with a multi-phase clock requires the use of at least four clock buffers.It is also worth noting that in addition to saving logical resources, the proposed clock-free method significantly limits device power dissipation, especially for high-speed clock signals commonly used in modern TDCs.

Results and TDC Parameters
The designed TDC was implemented in an FPGA device Virtex-7 (Xilinx) manufactured in 28 nm CMOS technology.Experimental tests were performed with the use of a development board VC707 (Xilinx), a pulse generator Agilent 81130A, and a control PC.A test setup basic block diagram is shown in Figure 6.
source TDC, based on an oversampling method, has the lowest resource utilization, but provides a rather moderate resolution of 625 ps [26].Since its resolution depends on the clock frequency, it is technology independent.The classical two-stage interpolation method [27] provides a high resolution of 45 ps but at the expense of a very high occupancy of FPGA logical resource.This is the result of a less sophisticated floor planning than for the proposed clock-free method.The designed two-stage TDC requires symmetric and parallel floor planning, which helps to achieve evenly distributed data paths and ensures meeting timing constraints.Since the proposed new method is clock-free, it naturally saves clock resources, including drastically reducing the number of BUFGs.Classical two-stage TDC implementation utilizes seven buffers of BUFG (one for input clocks, four for multiphase clocks from MMCM, two for TDLs), which is almost 25% of BUFG resources of the Virtex-7 (Xilinx) FPGA device.Authors of the low resource TDC did not provide in their reported results the number of BUFGs used; however, the implementation of 4xOversampling method with a multi-phase clock requires the use of at least four clock buffers.It is also worth noting that in addition to saving logical resources, the proposed clock-free method significantly limits device power dissipation, especially for highspeed clock signals commonly used in modern TDCs.

Results and TDC Parameters
The designed TDC was implemented in an FPGA device Virtex-7 (Xilinx) manufactured in 28 nm CMOS technology.Experimental tests were performed with the use of a development board VC707 (Xilinx), a pulse generator Agilent 81130A, and a control PC.A test setup basic block diagram is shown in Figure 6.Resolutions of both interpolation stages of TDC were evaluated with the aid of a statistical code density test [28,29].The coarse resolution q FIS evaluated for the FIS equals 95 ps, and it was calculated according to the VDL resolution formula: where T PLA is the mean propagation time of the Latch-based delay line and T PLU is the mean propagation time of the LUT-based delay line.
The fine resolution q SIS of the SIS, being the final resolution of the whole TDC, is approximately 13 ps and it was derived as propagation delay T PMUXCY of a single MUXCY of the multiplexers carry chain (q SIS = T PMUXCY ).According to the timing experiment performed in the development system Vivado IDE, the CARRY4 primitive in the Virtex-7 device introduces a delay of 53 ps, so the propagation time of a single MUXCY multiplexer is approximately 13.25 ps.
The obtained measurement range is relatively wide and equal to 3.4 ns.Its value follows from the VDL measurement range, which is the difference between the total delay introduced by the LUT-based delay line and the total delay of the Latch-based delay line.
To verify the correctness and precision of the proposed combination of conversion methods, several series of TIs measurements were performed.We executed 16 series of 512 measurements each, within the TI range from 0.2 ns to 3.4 ns changed with a 200 ps step.The obtained results are shown in Figures 7 and 8.The transfer function of the TDC (Figure 7) is monotonic but relatively nonlinear (Table 2).
propagation time of the LUT-based delay line.
The fine resolution qSIS of the SIS, being the final resolution of the whole TDC, is approximately 13 ps and it was derived as propagation delay TPMUXCY of a single MUXCY of the multiplexers carry chain (qSIS = TPMUXCY).According to the timing experiment performed in the development system Vivado IDE, the CARRY4 primitive in the Virtex-7 device introduces a delay of 53 ps, so the propagation time of a single MUXCY multiplexer is approximately 13.25 ps.
The obtained measurement range is relatively wide and equal to 3.4 ns.Its value follows from the VDL measurement range, which is the difference between the total delay introduced by the LUT-based delay line and the total delay of the Latch-based delay line.
To verify the correctness and precision of the proposed combination of conversion methods, several series of TIs measurements were performed.We executed 16 series of 512 measurements each, within the TI range from 0.2 ns to 3.4 ns changed with a 200 ps step.The obtained results are shown in Figures 7 and 8.The transfer function of the TDC (Figure 7) is monotonic but relatively nonlinear (Table 2).The TDC measurement uncertainty, presented in Figure 8, is not worse than 14 ps for most of the measurement range.However, one significant increase to the value of 21 ps is observed.We see its reason in the significant differences in the signal paths delays in the design.The delays introduced by both logical resources and interconnect paths are crucial for the proposed TDC.There are five logic levels between the FIS and the SIS (one for the PIPO register and four for the MUX LUTs).All those logic elements contribute to increasing systematic (propagation delay) and random errors (jitter).Fortunately, a clock-free design concept frees the TDC from additional errors of using the clock.
Despite efficient floorplanning, the implemented MUX input paths do not have the perfect same lengths.This can cause undesirable measurement uncertainty extremes that are evident in both, the conversion and precision characteristics, around a TI of 0.8 ns.Detailed analysis of the design implementation in the part from the VDL LUTs to the MUX input showed that the lengths of the paths responsible for transmitting the residual TI of approximately 1 ns differ the most.This effect increases systematic error and measurement uncertainty by introducing an additional delay on the STOP channel.Unfortunately, there is no means to set equal nets for fully combinatorial paths in an FPGA device.

Figure 3 .
Figure 3. Simplified block diagram of two-stage TDC.The VDL is designed as two parallel chains of latches D and related Look-Up Tables (LUTs) operating as noninverting buffers.The VDL output measurement data in thermometric code are converted into natural binary code by a fast combinational decoder.Output signals of each delay cell of the VDL are examined by CD to determine in which cell the coincidence occurs (Figure4).The idea of a coincidence detector operation is based on priority coding.A high logic state on gate output indicates coincidence of signals involved and the end of measurement in FIS.Information from the coincidence detector is used to address the multiplexer for establishing a path for the transmission of residual TI from the FIS to the SIS.

Figure 4 .
Figure 4. Schematic of the FIS.Figure 4. Schematic of the FIS.

Figure 4 .
Figure 4. Schematic of the FIS.Figure 4. Schematic of the FIS.

Figure 6 .
Figure 6.Block diagram of the test setup.Figure 6. Block diagram of the test setup.

Figure 6 .
Figure 6.Block diagram of the test setup.Figure 6. Block diagram of the test setup.

Table 1 .
Comparison of the three TDC implementations by resources utilization.

Table 1 .
Comparison of the three TDC implementations by resources utilization.

Table 2 .
Linear regression parameters of transfer function.

Table 2 .
Linear regression parameters of transfer function.