Next Article in Journal
Design and Analysis of an H-Type Pickup for Multi-Segment Wireless Power Transfer Systems
Previous Article in Journal
A Method of Diagnosing Analog Circuit Soft Faults Using Boruta Features and LightGBM
Previous Article in Special Issue
Agile FPGA Computing at the 5G Edge: Joint Management of Accelerated and Software Functions for Open Radio Access Technologies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

New High-Rate Timestamp Management with Real-Time Configurable Virtual Delay and Dead Time for FPGA-Based Time-to-Digital Converters

Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano, Via Golgi 40, 20133 Milano, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2024, 13(6), 1124; https://doi.org/10.3390/electronics13061124
Submission received: 7 February 2024 / Revised: 11 March 2024 / Accepted: 11 March 2024 / Published: 19 March 2024
(This article belongs to the Special Issue System-on-Chip (SoC) and Field-Programmable Gate Array (FPGA) Design)

Abstract

:
Modern applications require the ability to measure time events with high resolution, a full-scale range, and multiple input channels. Time-to-Digital Converters (TDCs) are a popular option to convert time intervals into timestamps. To reduce the time-to-market and Non-Recurring Engineering (NRE) costs, a Field-Programmable Gate Array (FPGA) implementation has been chosen. The high number of requested bits and channels, however, gives rise to routing congestion issues when routed in a parallel manner. In this paper, we will propose and analyze a novel solution, the Belt-Bus (BB), which involves a parallel-to-serial conversion of the timestamp stream coming from the TDC while maintaining chronological order and a sufficient high rate, and flagging the presence of timestamp overflow. Moreover, two new useful features are added. The first is a “Virtual Delay” to compensate for offsets due to cable length and FPGA routing path mismatch. The second is a “Virtual Dead-Time” to filter out unforeseen events. Finally, the BB was tested on a Xilinx 28 nm 7-Series Kintex-7 325T FPGA, achieving an overall data rate of 199.9 Msps with very limited resource usage (i.e., lower than a total of 4.5%), consuming only 480 mW in a 16-channel implementation.

1. Introduction

The Time-to-Digital Converter (TDC), which assigns a timestamp to an event, is a device used in various commercial and industrial settings, ranging from basic experimental setups to complex research and development projects [1,2,3,4]. Prominent examples of its major uses include Time-of-Flight Positron Emission Tomography (TOF-PET) [5] in the biomedical field and Laser Rangefinder [6] techniques for 3D imaging in industry and the automotive field. In the context of time-resolved spectroscopic experiments, TDCs are extensively utilized in academic settings, particularly in techniques like Time-Correlated Single Photon Counting (TCSPC) and the pump-and-probe experiment performed using a Free Electron Laser (FEL) or a synchrotron light [7].
The majority of modern 3D industrial image sensors employ a TDC system to measure the time it takes for a laser pulse to be detected after emission. These sensors, known as Light Detection and Ranging (LIDAR) or Time-of-Flight (TOF) sensors [8,9], find wide applications in areas such as aerial inspection [10] and autonomous driving [6]. Specifically, LIDARs [11] require a TDC with a large number of channels to benefit from a high frame rate, wide field of view, and excellent reliability [12].
Thanks to their cost-effective Non-Recurring Engineering (NRE) expenses, their excellent performance achieved, and their reprogrammable nature, FPGA-based TDC systems stand out as an optimal solution for fast prototyping, both in the realm of research and in industrial Research and Development (R&D) [13].
All of the modern applications mentioned above, both academic and industrial, require TDCs with a high Full-Scale Range (FSR) and resolution (LSB); thus, with a high number of bits on a significant number of parallel channels and the ability to operate at high rates (e.g., tens of megahertz per channel) [14]. In addition, these timestamps must be processed immediately in real time by various modules working in parallel typically hosted in programmable logic devices like Systems-on-Chip (SoCs) and Field-Programmable Gate Arrays (FPGAs) posing a routing challenge between the TDC and processing modules, both in terms of the congestion of the routing itself (i.e., a high number of required wires) and the potential generation of Cross-Talk (XT) events.
Hence, there is a drive to conceive and develop a parallel-to-serial timestamp data transmission architecture to facilitate the routing of tens of bits (i.e., high FSR and resolution) in multi-channel systems. This is complicated by the need for the serialization process to maintain the chronological order of the timestamps generated by the TDCs while simultaneously managing overflow phenomena, which can significantly impact processing efficiency. As a solution, a novel high-efficiency parallel-to-serial timestamp data transmission architecture protocol based on the AXI4-Stream protocol [15] (also known as AXIS), named Belt-Bus (BB), has been fully developed and validated as an IP-Core in TDC architectures implemented in FPGAs. It is worth noting that this architecture is equally suitable for implementation in ASICs.
This paper is structured as follows: After a description of TDCs and an overview of multi-channel system interconnection, Section 3 describes the proposed protocol and structure, while Section 4 addresses the main issues and their respective resolutions. The final structure is outlined in Section 5. Characterization in terms of area occupancy, power dissipation, and performance along with measurements conducted on a 16-channel TDC implemented in a Xilinx 28 nm 7-Series Kintex 325-T FPGA, is presented in Section 6.

2. Time-to-Digital Converter

In Section 2.1, the main Figures-of-Merit (FoMs) of the TDC will be summarized; additionally, in Section 2.2, the issue of connections for multi-channel TDC systems will be illustrated, with a related overview of the state of the art.

2.1. Backgrounds

In the scientific literature and in the industrial field, there are various architectures of TDCs implementable in both ASICs [16] and FPGAs [13]. Regardless of the type of structure, a TDC assigns a timestamp, referring to the clock with which the TDC is powered, to the occurrence of a low–high and/or high–low transition on the inputs. Being a digital device, in addition to the temporal reference for the timestamp, the clock of the TDC serves to manage the internal logic. Regardless of the architecture and their implementation in programmable logic (i.e., FPGA, SoC) or ASIC, TDCs are characterized by the following FoMs:
  • Resolution or LSB: the smallest time interval that can be accurately measured.
  • Precision or Jitter: variation in the output timing accuracy of the TDC.
  • Linearity: the degree to which the digital output is proportional to the input time interval, expressed as Differential and Integral Non-Linearity (DNL and INL).
  • Full-Scale Range (FSR): the maximum time interval measurable without encountering overflow issues.
  • Frequency of Overflow ( f o v f l ).
  • Number of bits ( N b i t ).
  • Number of channels operating in parallel ( N C H ).
  • Dead Time (DT): the time that elapsed between two successive measurements on the same channel.
  • Maximum Channel Rate (R): the maximum rate of measurements that a single channel can perform.
  • Maximum Output Data Rate (ODR): the maximum rate of output processed timestamps.
  • Area Occupation: physical size (for ASIC) or number of resources (for ASIC and FPGA) occupied.
  • Power Consumption: the amount of power consumed.
From these FoMs, we can easily calculate some relationships that exist among them, such as the connection between FSR, LSB, and N b i t (1), between FSR and f o v f l (2), between FSR, N b i t , and LSB (3), and the obvious inequalities that link DT with R (4), and ODR with R and Nch (5).
N b i t = log 2 ( F S R / L S B )
f o v f l = 1 / F S R
F S R = 2 N b i t · L S B
R 1 / D T
O D R N C H · R

2.2. Multi-Channel Connection Issues and State of the Art

Considering a multi-channel TDC, regardless of its architecture and implementation (e.g., FPGA/SoC vs ASIC), different solutions for timestamp read-out can be employed: serial or parallel. The difference between the two approaches lies in the fact that in the parallel read-out, each channel has a dedicated output line for the timestamp, whereas in a serial solution, there exists an arbitration mechanism for serialization. The adopted approach is relatively insignificant if the number of channels is low but becomes crucial for systems with eight or more channels.
If we consider a parallel read-out approach with a high number of implemented channels, we will have a total of N C H × N b i t lines to route within our device. This creates substantial internal congestion that severely limits place and route operations. Furthermore, this solution is inconvenient if the information needs to be transferred externally, as it would require a package with a high number of pins. Moreover, managing a large number of lines further increases the likelihood of generating XT events that interfere with sensitive parts of the circuitry. For this reason, an output serialization mechanism is incorporated into TDCs with a high number of channels. The effectiveness of such a circuit will significantly impact various FoMs of the TDC, such as the ODR, DT, R, area occupancy, and power consumption.
Indeed, a non-optimized and simpler serialization and sorting mechanism such as a round-robin (e.g., Timepix3 ASIC-TDC) algorithm performs well in terms of the ODR but not in terms of area and power consumption [17,18]. In this context, a system with N C H channels requires, approximately, a multiplexer with N C H inputs (i.e., N C H -to-1 MUX), whose area occupation and power dissipation exponentially increase with N C H [3]. On the other hand, there are serialization systems that, to keep power consumption and area low, rely on memories (e.g., PicoTDC) that record all timestamps for a certain acquisition time and then serially output them. Some of them, however, have high DTs and low rates (e.g., PETsys) [19,20], while others output timestamps without any order and sorting [21,22], requiring an additional processing stage downstream of the TDC if real-time processing is required by various modules working in parallel, such as histograms, counters, and coincidence detectors [4].
The proposed BB solution consists of an innovative serialization structure based on timestamp sorting through comparison, similar to what happens in round-robin. However, the distinctive feature is the distribution of the comparison process on 2-to-1 MUX distributed within the N C H nodes. This allows for high efficiency in terms of area occupancy and power consumption that scale linearly with N C H . Furthermore, the presence of memories and pipeline structures enables high data acquisition rates (ODR) to be achieved without compromising DT and R.

3. The Belt-Bus

In Section 3.1, the BB protocol is explained; the operating principles are described in Section 3.2, while a detailed logical description of the functioning is presented in Section 3.3, analyzing the submodules. The area occupancy and power dissipation of each submodule are presented in Section 3.4, with the Xilinx 28 nm 7-Series Kintex 325-T FPGA used as a case study.

3.1. Protocol

The BB is a synchronous bus based on AXI4-Stream and utilizes only the TVALID, TREADY, and TDATA signals. As a convention, a logical one for both TVALID and TREADY signifies a valid TDATA. The TDATA signal, as illustrated in Figure 1, comprises three portions: the Timestamp (TS) field with an obvious dimension of N b i t , a 2-bit wide Function Identifier (FID) field, and the Number of Channel (NUM_CH) field. The latter field has a non-defined a priori dimension to appropriately accommodate the number of channels involved in the measurement (i.e., log 2 ( N C H ) ), representing the channels’ numerical value.
The architecture of the BB was also designed to address the main issues related to the operation of the TDC without modifying the number of bits of the timestamps, specifically addressing overflow concerns. This was achieved through the utilization of the FID and TS, providing information to downstream modules about particular characteristics deemed useful in subsequent processing.
In the implementation presented here, the FIDs are coded as follows:
  • FID = 00: overflow event (in TS the overflow value is sent);
  • FID = 01: timestamp coming from a rising edge event;
  • FID = 10: unused;
  • FID = 11: timestamp coming from a falling edge event.
Each time an overflow occurs, a new frame with FID = 00 is injected into the BB. Now the FSR can be increased by a factor 2 N b i t from 2 N b i t · L S B up to 2 N b i t × ( 2 N b i t · L S B ) . However, this improvement comes at a cost to the Output Data Rate (ODR), as an overflow event must be sent once every 2 N b i t · L S B instead of the current timestamp. With the overflow frequency denoted as f o v f l = 1 / ( 2 N b i t · L S B ) and f C L K , B B as the clock frequency of the BB, the rate is given by:
O D R = f C L K , B B N C H · f o v f l = f C L K , B B N c h 1 2 N b i t · L S B
Equation (6) shows that there is a trade-off regarding this aspect. By decreasing N b i t , the ODR also decreases. It is important to notice that this trade-off is heavier as the number of channels increases. However, this depends on the used FPGA’s size. With larger and more complex FPGAs, routing issues can be minimized, enabling a slight increase in N b i t and, in most cases, f C L K , B B , thus enhancing the available data rate.

3.2. Principle of Operation

With each node representing a single channel, the BB is composed of a cascade of nodes (light blue in Figure 2) that serialize, in a pipeline way, the timestamp coming from TDC channels (yellow in Figure 2) in a chronological sequence. Every node has two inputs in BB protocol: the output of the preceding node, the “Top” port in Figure 2, and the current channel, the “Left” port in Figure 2. The only restriction on the number of channels that may be added with this chain arrangement is the amount of hardware that can be used by implementing the nodes and the constraint on the average channel rate (i.e., the ratio between the ODR and the total number of channels).
When a timestamp arrives at a node from the channel entrance (also known as “Left” port), it propagates through subsequent nodes to the terminal one. While timestamps can be arranged chronologically within a single channel, as they move through the chain of nodes, that arrangement may be lost. To prevent this, based on which of the two timestamps is temporally earlier, each node decides whether to prioritize the input from the channel (also known as “Left”) or that from the previous node (also known as “Top”). It might not be feasible to compare the input timestamp in the present node, though, because of potential delays in measurements on channels connected to earlier nodes and, consequently, the absence of the comparative timestamp. In this instance, the current timestamp could be propagated without adhering to the chronological order. Of course, the first node allows only the injection of the “Left” signal, so it possesses the “Top” signal with TVALID hardcoded to “0”.
In order to address this problem, there are four phases involved in obtaining the timestamp from the input channel through the node. Considering the contextualization of their dynamics in the architecture described in the next paragraph, from the perspective of their respective functions, these phases are, as follows, the:
  • Retain Phase: the timestamp from the TDC channel (also known as “Left” port) is blocked for a proper time at the input of the node in order to compensate for the pipeline introduced by the registers and FIFO present on the previous nodes.
  • Hold-on Phase: the timestamp from the TDC channel (also known as “Left” port) waits for a timestamp from the previous node for comparison for a finite time. If this occurs, the older timestamp is propagated at the node output.
  • Inject Phase: if the timestamp from the previous node (also known as “Top” port) is not valid or older, the timestamp from the channel (also known as “Left” port) is propagated at the node output.
  • Discard Phase: the timestamp from the TDC channel (also known as “Left” port) is simply discarded (not propagated in BB) because there is no propagation permission in the node chain within a finite time (for instance, if the chain were full).
For example, a graphical view of the operation of these phases on three timestamps is shown in Figure 3.

3.3. Architecture

The designed architecture of the BB was implemented on a Xilinx 28 nm 7-Series FPGA as an IP-Core and constituted by a cascade of stages called Node Inserters (Figure 4).
With reference to Figure 5, three components go into making up each Node Inserter; i.e.,
  • The Delay Synchronizer implements the Retain Phase shown in Section 3.2.
  • The Inserter is driven by logic that, using the information from the Delay Synchronizer, generates the selection signal for a multiplexer between the timestamp from the current channel (also known as “Left” port) and the one coming from the previous node (also known as “Top” port). Thus, it implements the Hold-on, Inject, and Discard Phases shown in Section 3.2.
  • The Super Sampler, which is a register that propagates the selected input to the output, ensuring the ready–valid handshake proper to the AXI4-Stream protocol without losing a clock cycle.
Figure 5. Top level block schematic of Node Inserter structure with submodules.
Figure 5. Top level block schematic of Node Inserter structure with submodules.
Electronics 13 01124 g005
The Delay Synchronizer makes the current timestamp from the TDC channel (also known as “Left” port) comparable to the one coming from the preceding node (also known as “Top” port). First of all, the incoming timestamp from the TDC channel (also known as “Left”) enters into a synchronous First-In First-Out (FIFO) clocked at f C L K , B B hosted in the Synchronizer. The validity of the output data is deasserted, preventing its propagation, until the propagation time through the FIFO has elapsed (i.e., Retain Phase). The TREADY signal at the “Top” interface allows the information to be stored in the FIFOs and registers of the previous nodes, thus avoiding the presence of an additional FIFO.
Due to the potential for skew and jitter phenomena when signals spread over a wide region, from a timing perspective, a highly intricate and sophisticated data management system is required inside the Delay Synchronizer to ensure that timestamp values can be compared. Thus, the Inserter module behaves as a multiplexer driven by logic that compares the current timestamp (also known as “Left” port) with the one coming from the preceding node (i.e. “Top” port) based on information returned by the Delay Synchronizer and moves ahead with overflows with the highest possible priority, followed by timestamps from the oldest to the newest. If the bandwidth saturates, newer timestamps are discarded. In detail, in the Inserter, if an overflow condition is communicated from the timestamp at the output of the Delay Synchronizer (also known as “Left” port) or from an older timestamp present from the previous node (also known as “Top” port), the incoming timestamp (i.e., the output of the Delay Synchronizer) is propagated forward. The assessment of this condition continues until the timeout, equal to an interval comprising the clock jitter, skews, and the delays of the pipeline stages that constitute the implementation (i.e., Hold-on Phase). At the timeout of the Hold-on Phase, if the node’s bus is ready to receive, the timestamp is propagated (Inject Phase); otherwise, it is discarded, allowing for a more recent timestamp to be placed at the FIFO output (Discard Phase).

3.4. Area Occupancy and Power Dissipation

The area occupation of the Node Inserter and its related submodules is a function of the number of bits in the TS fields (i.e., N b i t ). Table 1 presents the area occupancy in terms of Carry Logic (CARRY), Look-Up Tables (LUT), Flip-Flops (FF), and Look-Up Table RAM (LUTRAM) occupied. No resources in terms of Digital Signal Processor (DSP) modules and Block RAM (BRAM) are utilized. Additionally, the same table provides information on power dissipation (only dynamic considering that the power dissipated by the module is primarily of a dynamic nature), considering a maximum clock frequency of 130 MHz.

4. Main Issues and Solutions

As presented in Section 3, the BB also has two limitations.
The principal issue is that the present structure does not fully account for the uncertainty of the timestamp arrival time. In fact, two timestamps produced from distinct channels during the same TDC clock cycle can arrive at the Node Inserter at different times. This is particularly true at high channel rates. The primary cause of this is the needs of asynchronous FIFOs to accommodate the Clock Domain Crossing (CDC) between the clock of the TDC ( f C L K , T D C ) and the clock of the BB ( f C L K , B B ). This establishes the likelihood of unordered timestamps in specific scenarios. The solution to these issues is addressed in Section 4.1.
Another issue is that, considering the Xilinx 28 nm 7-Series FPGAs as a technological node, due to the architecture presented in Section 3, the maximum available BB clock frequency is not very high, about 130 MHz (i.e., f C L K , B B < 130   MHz ), which corresponds to only 16.25% of the maximum clock frequency that these technological nodes support (i.e., 800 MHz). The cause of these issues is analyzed and discussed in Section 4.2. Thanks to these two modifications, a frequency of 200 MHz (25% of the maximum available) can be achieved. The area occupancy and power dissipation of each submodule are presented in Section 4.3.

4.1. Unsorted Timestamps Issue

The first issue that has been addressed is related to the presence of different clock domains, which require asynchronous FIFOs between the TDC and BB as a CDC (Figure 6).
Under this condition, there are two further causes of timestamp unsorting. The first one, deterministic, is due to the different clock frequencies in case one channel has a high timestamp rate and another has a lower one; a timestamp entering in the first asynchronous FIFO can exit from it in a different time instant and so is injected late in the BB with respect to the other one in a less crowded channel since the data already stored in the asynchronous FIFO must exit first.
The second one is non-deterministic and is due to unpredictable CDC propagation delay. To better understand this, let us focus briefly on how a CDC works in the following subparagraphs.

4.1.1. CDC Uncertainty

In a basic CDC circuit, the simplest 1-bit two-stage architecture can be considered (Figure 7). In the first stage (flip-flop A), data are captured by a register in the source clock domain on the rising edge of the source clock (i.e., clk A). In the second stage (flip-flop B), the captured data are then transferred to a register in the destination clock domain on the rising edge of the destination clock (i.e, clk B). The clock uncertainty in this circuit arises because the rising edges of the source and destination clocks may not be perfectly aligned in time due to factors such as clock skew, jitter, or delay. As a result, data may be captured by the source register at a slightly different time than they are transferred to the destination register.
As consequence of that, the sampling register could enter into metastability [23]; on average, each Mean Time Between Failures (MTBF) given by the relation M T B F = f r t 0 · f C L K , A · f C L K , B , where f r is a parameter that depends on the flip-flop used, t 0 is a constant related to the width of the time window or aperture wherein a data edge triggers a metastable event, f C L K , A is the source clock domain frequency, and f C K L , B is the destination clock domain frequency. To quickly exit a possible metastability transient, the well-established cascade of registers must be added. Now, if, for example, two registers are put in cascade, there is not only one clock uncertainty due to sampling but another one, with lower probability, needed from the first register to recover from metastability. This uncertainty increases as the number of cascaded registers grows. Even in an asynchronous FIFO, if the two clocks are not derived from the same source (e.g., a divided clock), similar mechanisms are used internally, giving rise to a temporal uncertainty at the FIFO output.

4.1.2. Issue Evidence

By examining simulated waveforms focusing on two channels for simplicity, two situations emerge, resulting in unsorted timestamps. Figure 8 shows these two situations. In the first case (on the top side of Figure 8), timestamps “12” and “13” are sent to the “Left” port of the relative Node Inserter at the same instant, but due to the CDC issue, timestamp “13” is read before timestamp “12”, resulting in an unordered timestamp error on BB. In the second case (on the bottom side of Figure 8), timestamp “33” stays, due to CDC uncertainty, in the asynchronous FIFO for more time compared to timestamp “34”, causing unordered issues at the output.

4.1.3. Issue Solving

In order to mitigate this failure, two modifications have been introduced in the Node Inserter.
Since the timestamp entering the Node Inserter cannot be injected until the Hold-on Phase is active unless a newer timestamp reaches the “Top” port, the Retain Phase on the Delay Synchronizer is increased by a value larger than the time uncertainty introduced by the asynchronous FIFO used as a CDC, allowing the data to be properly compared by the Inserter. This way, the issue arrived on the top in Figure 8 is solved.
The second modification to the Inserter is mandatory to solve the issues present on the bottom of Figure 8. Instead of sending data to the Super Sampler when the Inject phase begins, another check is performed by simply waiting some clock cycles after the timestamp coming from the previous node is propagated. In this way, when the bus is full, the data comparison is always performed, avoiding unsorted timestamps. The number of cycles to wait is proportional to the node number to compensate for pipelines and the ratio between f C L K , T D C and f C L K , B B (where f C L K , T D C > f C L K , B B ). No wait cycles are requested if f C L K , T D C < f C L K , B B .

4.1.4. Order Checker

After the modifications introduced in Section 4.1.3, the chronological order issue becomes very rare and thus almost negligible: fewer than one in a billion samples (i.e., 1 × 10 9 ). This residual error is due to the stochastic nature of the CDC (i.e., MTBF), especially when the number of channels is high, and events occur randomly and at a high rate; occasionally, unsorted timestamps may be present. A possible way to solve the problem could be to increase the asynchronous FIFO depth inside the Delay Synchronizer excessively, leading to area occupancy problems in the FPGA. To avoid this issue, given the very low probability of encountering unsorted data, these instances are simply discarded, resulting in a negligible loss.
To perform the chronological order check, another IP-Core has been developed, the Order Checker, which takes as input the data from the last Node Inserter and checks timestamp sorting, deasserting the validation if incoming data do not respect this condition.
As can be seen from Figure 9, the order checker has AXI4-Stream input (i.e., s00_bb in Figure 9) and AXI4-Stream output (i.e., M00_bb in Figure 9) for BB data, along with an AXI4 Memory-Mapped port to read out the number of unsorted timestamps, solely for debugging purposes (i.e., S00_axi in Figure 9). This module checks the incoming timestamp and compares it with the already stored one: if the new one is more recent, the data are propagated and replace the already stored one; if it is older, the valid signal is deasserted, and the relative counter is incremented by one. If no data are present, i.e., the module has been initialized, the first timestamp is stored and propagated.

4.2. Limited Output Data Rate Issue

By implementing the Node Inserters in Vivado, considering the Xilinx 28 nm 7-Series FPGAs as a technological node, it can be clearly seen that the maximum clock frequency is limited by two different sources.
The main one comes from the way of performing the comparison (i.e., the symbols “>” in VHDL code) between the two timestamps entering the node (i.e., those coming to the “Left” and those coming from the “Top” ports), which requires, by the default encoding performed by Vivado, the computation of two subtractions and an unsigned comparison. This results in a very high requirement in terms of logic resources, mainly LUTs and CARRY, because the number of bits of the timestamps is high (e.g., 32 to 64). The intervention carried out to increase the maximum clock frequency was replacing the comparison operation (i.e., the symbols “>” in VHDL code) with a simpler one. Only a signed subtraction between the timestamps (i.e., “Left” minus “Top” in VHDL code) is performed, and then, a check is performed on the sign of the result. If the result is positive, the “Top” timestamp is older (i.e., “Left” is bigger than “Top” so more recent in time) and has the priority; otherwise, the “Top” data are propagated. Since performing the sign check is enough to observe the MSB of the result, the number of Carry Logic decreases by a factor of two, as they are only needed to perform one operation instead of three.
The second improvement can be introduced by replacing the Super Sampler with a more efficient pipelined structure, called AXIS Register Slice, that occupies the same hardware resources. The working principle is similar to having two-slot FIFOs. The data entering the module are stored in the output register if nothing is already stored in it. Thanks to these two modifications, a frequency of 200 MHz (20% of the maximum available) can be achieved.

4.3. Area Occupancy and Power Dissipation

The area occupation of the Node Inserter and its related submodules (with the modification proposed in this Section) is a function of the number of bits in the TS fields (i.e., N b i t ). Table 2 presents the area occupancy in terms of CARRY, LUT, FF, and LUTRAM occupied. No resources in terms of DSP and BRAM are utilized. Additionally, the same table provides information on power dissipation, considering a maximum clock frequency of 200 MHz. Comparing Table 2 to Table 1, it is possible to observe a similar occupation and an increase by a factor of two in the CARRY occupied by the Inserter, along with the replacement of the Super Sampler with the AXIS Register Slice. Moreover, a higher usage of LUTs and FFs is observed in the Delay Synchronizer to address the issues outlined in Section 4.1. The higher power dissipation is attributed to a higher clock frequency (200 MHz instead of 130 MHz).

5. Main New Features

The description of two new features, the Virtual Delay in Section 5.1, and Virtual Dead Time in Section 5.2, is the purpose of this section. These two improvements are performed with very careful attention to the timing analysis to ensure a maximum clock frequency of 200 MHz, as described in Section 4.2. Lastly, an overview of the complete structure of the BB is shown in Section 5.3. The area occupancy and power dissipation of each submodule are presented in Section 5.4

5.1. Virtual Delay

In many applications, only relative times must be measured by computing differences between different channels. For this reason, static offset compensation can be very useful, for example, to have the resulting histogram centered at zero. In order to perform this task, a Virtual Delay feature has been developed in the BB. In this way, the timestamps coming out from the Node Inserters are not only chronologically ordered but also translated in time. This allows compensating offsets due to both different cable lengths and FPGA routing path mismatches between TDC channels.

5.1.1. Architecture

A simple summing of the incoming timestamp delay is not sufficient to accomplish this feature. Indeed, if some synchronization mechanism is not present, the BB would “brake”, leading to unsorted timestamps. As will be explained later, since synchronization requires memory, the Virtual Delay cannot reach very high values (i.e., up to 2 20 × LSB ), otherwise, the resource usage in the FPGA would be enormous. On the other hand, since the static offset due to FPGA routing is a few tens of picoseconds, considering an LSB of tens of femtoseconds, implementing a maximum delay in the order of hundreds of nanoseconds, which is quite feasible, would be enough. For example, by approximating the speed of signals at 30 cm/ns, a 1 µs delay would be sufficient to compensate for a 300 m cable length offset, which is a very high value. This is the main reason because the maximum delay value is less than or equal to the maximum timestamp value.
However, this is not the only thing that must be managed very carefully. When summing a value to a timestamp, the result can be larger than the maximum value of 2 N b i t 1 . In this case, an overflow has to be generated, and careful attention must be paid to discard the next incoming one. With that said, another reason for choosing the maximum timestamp value is that the maximum overflow difference between the original and the delayed sample is one, which makes the process much simpler to implement.
Moreover, for the implementation of the Virtual Delay feature, a modular architecture has been used. A new module, the Virtual Delay Inserter, has been developed and instantiated in series before a modified version of the Delay Synchronizer called Virtual Delay Synchronizer. In detail:
  • The Virtual Delay Inserter handles the summation between the delay and the timestamp. It is also responsible for overflow handling when overflows must be generated or discarded.
  • The Virtual Delay Synchronizer handles the synchronization of the delayed timestamps.
Figure 10 shows the modular structure of the Node Inserter with the Virtual Delay functionality.

5.1.2. Virtual Delay Inserter

This module consists of two pipeline stages.
The first stage is responsible for the timestamp computation. Since the summing of the delay can introduce an overflow, the second stage is needed to handle the overflow generation and the correct sampling and propagation of the timestamp, in order not to lose data. Since when an overflow is generated the next received one must be discarded, the data that are overwritten by the generated one are stored in a register. After this event, the timestamps are propagated through this register until there are no valid data to be sent. In cases where the rate is at maximum, this happens when an overflow is received from the TDC. Since the Virtual Delay can vary over time, another possible issue arises: if an overflow has been generated and the delay value decreases, the new timestamp can have a value that refers to the previous one. In order to solve this issue when an overflow is generated, the virtual delay, if it is lower than the previous one, is updated only after an overflow from the TDC has been received. Finally, the generated overflow flag is needed by the Virtual Delay Synchronizer.

5.1.3. Virtual Delay Synchronizer

Compared to the previously introduced Delay Synchronizer, the Virtual Delay Synchronizer converts the Virtual Delay into a pulse of clocks at the BB clock (\(\Delta VD\)) to wait before starting the Retain Phase in order to synchronize the delayed timestamp injection into the BB.

5.2. Virtual Dead Time

The ability to insert a programmable dead time between measurements on the same channel is a really helpful feature. When a signal from a detector has a rising and/or a falling edge, a TDC timestamp is produced. Although filtering is typically requested, in fact, some input noise can still exist and cause unforeseen timestamps (red in Figure 11). These spurious timestamps could be discarded in post-processing by the elaboration modules; however, when the BB rate is high, such discarding might lead to saturations and result in the loss of samples. The Virtual Dead-Time functionality, which stops the incoming events for a programmable period of time (i.e., Virtual Dead Time, represented as keyword KILL in Figure 11) after one has been received, has been added to prevent this.
A new IP-Core, named Time Killer (Figure 12), has been developed to enable this feature. The output valid is deasserted if the difference between the receiving timestamp and the input is smaller than the Virtual Dead-Time value. The IP-Core accepts timestamp form TDC (S00_AXIS input port Figure 12), a Virtual Dead-Time value, and provides a timestamp to the left port of the Node Inserter (M00_AXIS port in Figure 12).

5.3. Final Belt-Bus Structure

In conclusion, the new BB structure’s design, including the Virtual Delay and Virtual Dead Time functions, is shown in Figure 13.

5.4. Area Occupancy and Power Dissipation

The area occupation of the Node Inserter and its related submodules (with the modification proposed in this Section) is a function of the number of bits in the TS fields (i.e., N b i t ). Table 3 presents the area occupancy in terms of CARRY, LUT, FF, and LUTRAM occupied. No resources in terms of DSP and BRAM are utilized. Additionally, the same table provides information on power dissipation, considering a maximum clock frequency of 200 MHz. Comparing Table 3 to Table 1, it is possible to observe a similar occupation and an increase by a factor of two in the CARRY occupied by the Inserter, along with the replacement of the Super Sampler with the AXIS Register Slice. Moreover, a higher usage of LUTs and FFs is observed in the Delay Synchronizer to address the issues outlined in Section 4.1. The higher power dissipation is attributed to a higher clock frequency (200 MHz instead of 130 MHz).

6. Measures and Characterizations

A 3- and 16-channel TDC IP-Cores (with 3 and 16 parallel outputs each), provided by TEDIEL S.r.l. [24], was utilized to test the entire system and undertake the validation of what has been proposed. In Table 4, all the performance metrics of the two TDCs are reported, identical in all respects except for the number of channels. Obviously, the ODR is expressed as the output rate of each individual channel, which will be modified by subsequently inserting the Node Inserter and the structure of the BB.
Tests on the reference TDC architectures are performed on different FPGAs. The host FPGAs are both Xilinx 28 nm 7-Series: an Artix-7 100T for the 3-channel TDC (Figure 14) and a Kintex-7 325T for the 16-channel solution (Figure 15).
The performed tests involve injecting a pseudo-random signal into each channel of the TDC using the ACTRIVE Arbitrary Function Generator (AWG). Since the AWG has 4 channels in the 16-channel solution, it was decided to divide each channel of the AWG so that it controls 4 channels of the TDC. Pseudo-random signals were generated to ensure a distance between successive events greater than the set dead time. The experiment was conducted by uniformly increasing the rates of all TDC channels, monitoring and subsequently analyzing the output of the BB to verify its correct operation. The status of the various nodes and the Order Checker was also monitored to verify the occurrence of discard phases and the absence of unordered timestamps. To validate the correct operation of the Time Killer and Virtual Delay modules, the experiment was automated with a script and repeated for numerous values of dead time (1024) and virtual delay (512).
Another test performed, once again with the help of the AWG and a script for its automation, both on the 3-channel and 16-channel versions, involved keeping the rate of all channels except one at zero and then increasing it uniformly, while monitoring and subsequently analyzing the output of the BB and the Order Checker. The experiment was repeated for each channel, always with 1024 values of dead time and 512 of virtual delay.
Outcomes of the experiments are reported in Table 5 for the 3-channel TDC and in Table 6 for the 16-channel solution.
For each of the tests described before, the absence of unsorted timestamps at the BB output was firstly checked. Even though the value of the Order Checker counter is always 0, this module has been kept in for safety reasons; mainly because, although it is simple to model during simulation, the clock uncertainty caused by the CDC is high in rare cases in real situations. Another milestone concerns the ODR, which can achieve up to 149.9 Msps and 199.9 Msps for Artix-7 and Kintex-7, respectively. Additionally, the presence or absence of at list one Discard Phase were monitored; it was observed when the sum of the total rates across the 16 channels reached 97% of the ODR in both solutions.
In both solutions, the ODR, being f o v f l negligible as per (6), is very close to f C L K , B B , which is, respectively, 150 MHz for the Artix-7 solution and 200 MHz for the Kintex-7, representing 24% and 25% of the maximum clock frequency that the two devices can handle (625 MHz for the Artix-7 and 800 MHz for the Kintex-7). This is an excellent result considering that, typically, the maximum clock frequency of a system in an FPGA is between 10% and 15% of the maximum frequency. Moreover, from the perspective of area utilization and power consumption, we observe that the presence of the BB is negligible (at least a factor 10) compared to that of the TDC.
Furthermore, the dependence of the maximum clock frequency of the BB (expressed as a ratio to the maximum frequency allowed by the FPGA, f F P G A M A X ) on the number of nodes/channels ( N C H ) and the number of bits ( N b i t ) on different devices of the Artix-7 family (i.e., 35T and 100T with 32,280 and 101,440 logic cells, respectively, and a maximum clock frequency of 625 MHz) and Kintex-7 (i.e., 325T and 480T with 326,080 and 477,760 logic cells, respectively, and a maximum clock frequency of 625 MHz) was analyzed by compiling different versions of the BB based on N b i t and N C H .
The results, shown in Figure 16, highlight a dependence of the ratio α defined as f C L K , B B / f F P G A M A X (where f F P G A M A X is 625 MHz for Artix-7 and 800 MHz for Kintex-7) on the product N defined as N C H × N b i t ; we can observe a drop in α when the ratio between Logic Cells (LCs) and N is below a value roughly between 300 and 500. This trend is due to routing difficulties caused by the reduction in available resources, indicated by the number of Logic Cells (LC) provided by the device, and the linearity with which the internal modules of the BB scale in terms of area occupation.

7. Conclusions

This work focuses on a new timestamp management system called BB. The objective is to implement a parallel-to-stream conversion to alleviate the routing of timestamps from high-performance TDCs (i.e., high resolution and FSR, resulting in a high number of bits at a high rate) to the processing module in multichannel applications. The key characteristic of this method is the serialization of several TDC channels in a modular approach, producing timestamps in chronological order while flagging overflow.
In this paper, two issues have been addressed, and two new functionalities have been introduced. The first issue pertains to the occurrence of unsorted timestamps due to the CDC between the clock of the TDC and the BB. Subsequently, by enhancing the FPGA’s critical path operations, a second issue related to the Belt-Bus’s restricted output rate was resolved.
Additionally, two new features have been incorporated. The first is a Virtual Delay, utilized to compensate for offsets resulting from varying wire lengths between TDC channels and mismatched FPGA routing circuits. The second is Virtual Dead Time, employed to eliminate unforeseen events caused by residual noise at the TDC input.
The BB has been tested on a Xilinx 28 nm 7-Series Kintex-7 325T FPGA, yielding an overall data rate of 199.9 Msps with very limited resource usage (i.e., less than a total of 4.5%) and a power consumption of only 480 mW, considering a 16-channel implementation.

Author Contributions

Methodology, F.G.; Software, G.B.; Validation, E.R. and A.C.; Writing—original draft, N.L.; Writing—review & editing, A.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in this article.

Acknowledgments

A special thanks goes to TEDIEL S.r.l., a spin-off of Politecnico di Milano, for providing the TDC IP-Core.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lusardi, N.; Geraci, A. 8-Channels high-resolution TDC in FPGA. In Proceedings of the 2015 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), San Diego, CA, USA, 31 October–7 November 2015; pp. 1–2. [Google Scholar] [CrossRef]
  2. Tancock, S.; Arabul, E.; Dahnoun, N. A Review of New Time-to-Digital Conversion Techniques. IEEE Trans. Instrum. Meas. 2019, 68, 3406–3417. [Google Scholar] [CrossRef]
  3. Portaluppi, D.; Pasquinelli, K.; Cusini, I.; Zappa, F. Multi-Channel FPGA Time-to-Digital Converter With 10 ps Bin and 40 ps FWHM. IEEE Trans. Instrum. Meas. 2022, 71, 2002109. [Google Scholar] [CrossRef]
  4. Wang, Y.; Xie, W.; Chen, H.; Li, D.D.U. Multichannel Time-to-Digital Converters with Automatic Calibration in Xilinx Zynq-7000 FPGA Devices. IEEE Trans. Ind. Electron. 2022, 69, 9634–9643. [Google Scholar] [CrossRef]
  5. Lewellen, T.K. Time-of-flight PET. Semin. Nucl. Med. 1998, 28, 268–275. [Google Scholar] [CrossRef] [PubMed]
  6. Li, Y.; Ibanez-Guzman, J. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
  7. Costa, A.; Lusardi, N.; Garzetti, F.; Ronconi, E.; Maffessanti, S.; Danilevski, C.; Lomidze, D.; Turcato, M.; Porro, M.; Geraci, A. A Study of the Latest Updates of the DAQ Firmware for the DSSC Camera at the European XFEL. IEEE Access 2023, 11, 84323–84335. [Google Scholar] [CrossRef]
  8. Garzetti, F.; Salgaro, S.; Venialgo, E.; Lusardi, N.; Corna, N.; Geraci, A.; Charbon, E. Plug-and-play TOF-PET Module Readout Based on TDC-on-FPGA and Gigabit Optical Fiber Network. In Proceedings of the 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), Manchester, UK, 26 October–2 November 2019; pp. 1–4. [Google Scholar] [CrossRef]
  9. Nogrette, F.; Heurteau, D.; Chang, R.; Bouton, Q.; Westbrook, C.; Sellem, R.; Clément, D. Characterization of a detector chain using a FPGA-based Time-to-Digital Converter to reconstruct the three-dimensional coordinates of single particles at high flux. Rev. Sci. Instrum. 2015, 86, 113105. [Google Scholar] [CrossRef] [PubMed]
  10. Chiu, C.L.; Fei, L.Y.; Liu, J.K.; Wu, M.C. National airborne LiDAR mapping and examples for applications in deep-seated landslides in Taiwan. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4688–4691. [Google Scholar] [CrossRef]
  11. Stoppa, D.; Gonzo, L.; Simoni, A. Scannerless 3D imaging sensors. In Proceedings of the IEEE International Workshop on Imaging Systems and Techniques, Niagara Falls, ON, Canada, 13 May 2005; pp. 58–61. [Google Scholar] [CrossRef]
  12. Lusardi, N.; Garzetti, F.; Costa, A.; Ronconi, E.; Geraci, A. From Multiphase to Novel Single-Phase Multichannel Shift-Clock Fast Counter Time-to-Digital Converter. IEEE Trans. Ind. Electron. 2023, 1–9. [Google Scholar] [CrossRef]
  13. Machado, R.; Cabral, J.; Alves, F.S. Recent Developments and Challenges in FPGA-Based Time-to-Digital Converters. IEEE Trans. Instrum. Meas. 2019, 68, 4205–4221. [Google Scholar] [CrossRef]
  14. Wang, Y.; Xie, W.; Chen, H.; Li, D.D.U. Low-Hardware Consumption, Resolution-Configurable Gray Code Oscillator Time-to-Digital Converters Implemented in 16 nm, 20 nm, and 28 nm FPGAs. IEEE Trans. Ind. Electron. 2023, 70, 4256–4266. [Google Scholar] [CrossRef]
  15. AMBA 4 AXI4-Stream Protocol Specification. Available online: https://developer.arm.com/documentation/ihi0051/a/Introduction/About-the-AXI4-Stream-protocol (accessed on 10 March 2024).
  16. Sesta, V.; Incoronato, A.; Madonini, F.; Villa, F. Time-to-digital converters and histogram builders in SPAD arrays for pulsed-LiDAR. Measurement 2023, 212, 112705. [Google Scholar] [CrossRef]
  17. Timepix. Available online: https://medipix.web.cern.ch/taxonomy/term/221 (accessed on 10 March 2024).
  18. Timepix3. Available online: https://medipix.web.cern.ch/taxonomy/term/236 (accessed on 10 March 2024).
  19. PETsys TOF PET ASIC. Available online: https://www.petsyselectronics.com/web/product1 (accessed on 10 March 2024).
  20. PETsys TOFPET2 ASIC. Available online: https://www.petsyselectronics.com/web/public/products/1 (accessed on 10 March 2024).
  21. Zhang, M.; Wang, H.; Liu, Y. A 7.4 ps FPGA-Based TDC with a 1024-Unit Measurement Matrix. Sensors 2017, 17, 865. [Google Scholar] [CrossRef] [PubMed]
  22. PicoTDC. Available online: https://kt.cern/sites/default/files/technology/picotdc/tech-brief/picotdc_0.pdf (accessed on 10 March 2024).
  23. Wellheuser, C. Metastability Performance of Clocked FIFOs; Texas Instruments Inc.: Dallas, TX, USA, 1996. [Google Scholar]
  24. TEDIEL. Available online: https://tediel.com/ (accessed on 10 March 2024).
Figure 1. Fields of BB in TDATA signal.
Figure 1. Fields of BB in TDATA signal.
Electronics 13 01124 g001
Figure 2. Structure of the Belt-Bus.
Figure 2. Structure of the Belt-Bus.
Electronics 13 01124 g002
Figure 3. Timing diagram showing the phases through which the timestamps 38, 40, and 42 enter the node chain.
Figure 3. Timing diagram showing the phases through which the timestamps 38, 40, and 42 enter the node chain.
Electronics 13 01124 g003
Figure 4. Node Inserter IP-Core; m00_bb is the output port and s00_axi and s00_bb are the “Left” and “Top” input ports, respectively.
Figure 4. Node Inserter IP-Core; m00_bb is the output port and s00_axi and s00_bb are the “Left” and “Top” input ports, respectively.
Electronics 13 01124 g004
Figure 6. Connection between the TDC (orange), clocked at clk TDC, and the BB (blue), clocked at clk BB, is established using an asynchronous FIFO (Async. FIFO, green) employed as CDC.
Figure 6. Connection between the TDC (orange), clocked at clk TDC, and the BB (blue), clocked at clk BB, is established using an asynchronous FIFO (Async. FIFO, green) employed as CDC.
Electronics 13 01124 g006
Figure 7. Basic Clock Domain Crossing structure and waveforms.
Figure 7. Basic Clock Domain Crossing structure and waveforms.
Electronics 13 01124 g007
Figure 8. Waveforms with CDCs modeled.
Figure 8. Waveforms with CDCs modeled.
Electronics 13 01124 g008
Figure 9. Order Checker IP-Core.
Figure 9. Order Checker IP-Core.
Electronics 13 01124 g009
Figure 10. Modular structure of the Node Inserter with the Virtual Delay functionality.
Figure 10. Modular structure of the Node Inserter with the Virtual Delay functionality.
Electronics 13 01124 g010
Figure 11. Waveform of the Virtual Dead-Time functionality.
Figure 11. Waveform of the Virtual Dead-Time functionality.
Electronics 13 01124 g011
Figure 12. Time Killer IP-Core.
Figure 12. Time Killer IP-Core.
Electronics 13 01124 g012
Figure 13. Structure of the improved BB.
Figure 13. Structure of the improved BB.
Electronics 13 01124 g013
Figure 14. Picture of the FELIX board (left) hosting the Artix-7 100T for the 3-channel TDC IP-Core and the setup (right).
Figure 14. Picture of the FELIX board (left) hosting the Artix-7 100T for the 3-channel TDC IP-Core and the setup (right).
Electronics 13 01124 g014
Figure 15. Picture of the Panther (left) board hosting the Kintex-7 325T for the 16-channel TDC IP-Core and the setup (right).
Figure 15. Picture of the Panther (left) board hosting the Kintex-7 325T for the 16-channel TDC IP-Core and the setup (right).
Electronics 13 01124 g015
Figure 16. Picture of the Panther board hosting the Kintex-7 325T for the 16-channel TDC IP-Core.
Figure 16. Picture of the Panther board hosting the Kintex-7 325T for the 16-channel TDC IP-Core.
Electronics 13 01124 g016
Table 1. Area occupancy and power dissipation of Node Inserter and its related submodules clocked at 130 MHz, as presented in Section 3.
Table 1. Area occupancy and power dissipation of Node Inserter and its related submodules clocked at 130 MHz, as presented in Section 3.
N bit Module/SubmodulesPower [mW]CARRYLUTFFLUTRAM
16Node Inserter7813714820
Delay Synchronizator5 1266420
Inserter1822
Super Sampler1 982
24Node Inserter71214918024
Delay Synchronizator5 13911424
Inserter11222
Super Sampler1 864
32Node Inserter91616921228
Delay Synchronizator7 15812828
Inserter<1%2022
Super Sampler2 982
40Node Inserter102023424436
Delay Synchronizator8 22314436
Inserter<1%2022
Super Sampler2 998
48Node Inserter112426427640
Delay Synchronizator9 25315040
Inserter<1%2422
Super Sampler2 9124
56Node Inserter132829230844
Delay Synchronizator9 28117644
Inserter12822
Super Sampler2 9130
64Node Inserter143232434048
Delay Synchronizator10 31318948
Inserter13222
Super Sampler3 9149
Table 2. Area occupancy and power dissipation of Node Inserter and its related submodules presented in Section 4.
Table 2. Area occupancy and power dissipation of Node Inserter and its related submodules presented in Section 4.
N bit Module/SubmodulesPower [mW]CARRYLUTFFLUTRAM
16Node Inserter10412819220
Delay Synchronizator8 12710820
Inserter<1%422
AXIS Register Slice2 982
24Node Inserter10615024024
Delay Synchronizator8 14017424
Inserter1622
AXIS Register Slice1 864
32Node Inserter13817028828
Delay Synchronizator10 15920428
Inserter1822
AXIS Register Slice2 982
40Node Inserter151023633636
Delay Synchronizator12 22423636
Inserter<1%1022
AXIS Register Slice3 998
48Node Inserter171226538440
Delay Synchronizator13 25425840
Inserter<1%1222
AXIS Register Slice4 9124
56Node Inserter191429343244
Delay Synchronizator14 28230044
Inserter11422
AXIS Register Slice4 9130
64Node Inserter211632548048
Delay Synchronizator16 31432948
Inserter<1%1622
AXIS Register Slice5 9149
Table 3. Area occupancy and power dissipation of Node Inserter and its related submodules presented in Section 5.
Table 3. Area occupancy and power dissipation of Node Inserter and its related submodules presented in Section 5.
N bit Module/SubmodulesPower [mW]CARRYLUTFFLUTRAM
16Node Inserter18432845358
Virtual Delay inserter8 19026138
Delay Synchronizator8 12710820
Inserter<1%422
AXIS Register Slice2 982
Time Killer7 35159
24Node Inserter18640963272
Virtual Delay inserter8 25939248
Delay Synchronizator8 14017424
Inserter1622
AXIS Register Slice1 864
Time Killer7 72223
32Node Inserter23842978584
Virtual Delay inserter10 25949756
Delay Synchronizator10 15920428
Inserter1822
AXIS Register Slice2 982
Time Killer7 94415
40Node Inserter2710549937108
Virtual Delay inserter12 31460172
Delay Synchronizator12 22423636
Inserter<1%1022
AXIS Register Slice3 998
Time Killer10 101407
48Node Inserter30126331090120
Virtual Delay inserter13 36870680
Delay Synchronizator13 25425840
Inserter<1%1222
AXIS Register Slice4 9124
Time Killer16 126479
56Node Inserter33147161242132
Virtual Delay inserter14 42381088
Delay Synchronizator14 28230044
Inserter11422
AXIS Register Slice4 9130
Time Killer18 140415
64Node Inserter37168021395144
16 47791596
Delay Synchronizator16 31432948
Inserter<1%1622
AXIS Register Slice5 9149
Time Killer20 161535
Table 4. TDC performance.
Table 4. TDC performance.
FeatureValue
Number of Channels3 and 16
N b i t 32
LSB36.6 fs
FSR157.3 μs
f o v f l 6.36 kHz
Dead Time5 ns
Maximum Channel Rate120 MHz
ODR/Ch120 Msps
Precision<12 pr r.m.s.
DNL<800 fs
INL<16 ps
LUT/Channel3869
FF/Channel5255
LUTRAM/Channel75
CARRY/Channel390
BRAM/Channel2
Power/Channel284 mW
Table 5. BB performance in presence of different versions of BB proposed in this work for the 3-channel TDC.
Table 5. BB performance in presence of different versions of BB proposed in this work for the 3-channel TDC.
FeatureSection 3Section 4Section 5
Number of Channels333
N b i t 323232
LSB36.6 fs36.6 fs36.6 fs
FSR157.3 μs157.3 μs157.3 μs
f C L K , B B 130 MHz150 MHz150 MHz
f o v f l 6.36 kHz6.36 kHz6.36 kHz
ODR129.9 Msps149.9 Msps149.9 Msps
Rate w/o Discard126.0 Msps146.0 Msps145.0 Msps
Dead Time5 ns5 ns5 ns ÷ 1 ms
Virtual DelayN.A.N.A.0 ÷ 78.6 μs
Unsorted Timestamp1%N.A.N.A.
BB Total Occupancy LUT0.96%0.96%2.43%
BB Total Occupancy FF0.49%0.69%1.79%
BB Total Occupancy LUTRAM0.91%0.91%2.74%
BB Total CARRY0.30%0.15%0.15%
TDC Total Occupancy LUT18.5%18.5%18.5%
TDC Total Occupancy FF12.6%12.6%12.6%
TDC Total Occupancy LUTRAM0.55%0.55%0.55%
TDC Total CARRY7.43%7.43%7.43%
TDC Total BRAM4.39%4.39%4.39%
BB Total Power144 mW208 mW480 mW
TDC Total Power4544 mW4544 mW4544 mW
Table 6. BB performance in presence of different vertions of BB proposed in this work for the 16-channel TDC.
Table 6. BB performance in presence of different vertions of BB proposed in this work for the 16-channel TDC.
FeatureSection 3Section 4Section 5
Number of Channels161616
N b i t 323232
LSB36.6 fs36.6 fs36.6 fs
FSR157.3 μs157.3 μs157.3 μs
f C L K , B B 130 MHz200 MHz200 MHz
f o v f l 6.36 kHz6.36 kHz6.36 kHz
ODR129.9 Msps199.9 Msps199.9 Msps
Rate w/o Discard126.0 Msps193.0 Msps192.0 Msps
Dead Time5 ns5 ns5 ns ÷ 1 ms
Virtual DelayN.A.N.A.0 ÷ 78.6 μs
Unsorted Timestamp1%N.A.N.A.
BB Total Occupancy LUT1.58%1.58%3.98%
BB Total Occupancy FF0.80%1.08%2.94%
BB Total Occupancy LUTRAM1.50%1.50%4.50%
BB Total CARRY0.50%0.25%0.25%
TDC Total Occupancy LUT30.4%30.4%30.4%
TDC Total Occupancy FF20.6%20.6%20.6%
TDC Total Occupancy LUTRAM0.9%0.9%0.9%
TDC Total CARRY12.2%12.2%12.2%
TDC Total BRAM7.2%7.2%7.2%
BB Total Power144 mW208 mW480 mW
TDC Total Power4544 mW4544 mW4544 mW
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Garzetti, F.; Bonanno, G.; Lusardi, N.; Ronconi, E.; Costa, A.; Geraci, A. New High-Rate Timestamp Management with Real-Time Configurable Virtual Delay and Dead Time for FPGA-Based Time-to-Digital Converters. Electronics 2024, 13, 1124. https://doi.org/10.3390/electronics13061124

AMA Style

Garzetti F, Bonanno G, Lusardi N, Ronconi E, Costa A, Geraci A. New High-Rate Timestamp Management with Real-Time Configurable Virtual Delay and Dead Time for FPGA-Based Time-to-Digital Converters. Electronics. 2024; 13(6):1124. https://doi.org/10.3390/electronics13061124

Chicago/Turabian Style

Garzetti, Fabio, Gabriele Bonanno, Nicola Lusardi, Enrico Ronconi, Andrea Costa, and Angelo Geraci. 2024. "New High-Rate Timestamp Management with Real-Time Configurable Virtual Delay and Dead Time for FPGA-Based Time-to-Digital Converters" Electronics 13, no. 6: 1124. https://doi.org/10.3390/electronics13061124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop