An 8.8 ps RMS Resolution Time-To-Digital Converter Implemented in a 60 nm FPGA with Real-Time Temperature Correction.

This paper presented a non-uniform multiphase (NUMP) time-to-digital converter (TDC) implemented in a field-programmable gate array (FPGA) with real-time automatic temperature compensation. NUMP-TDC is a novel, low-cost, high-performance TDC that has achieved an excellent performance in Altera Cyclone V FPGA. The root mean square (RMS) for the intrinsic timing resolution was 2.3 ps. However, the propagation delays in the delay chain of some FPGAs (for example, the Altera Cyclone 10 LP) vary significantly as the temperature changes. Thus, the timing performances of NUMP-TDCs implemented in those FPGAs are significantly impacted by temperature fluctuations. In this study, a simple method was developed to monitor variations in propagation delays using two registers deployed at both ends of the delay chain and compensate for changes in propagation delay using a look-up table (LUT). When the variations exceeded a certain threshold, the LUT for the delay correction was updated, and a bin-by-bin correction was launched. Using this correction approach, a resolution of 8.8 ps RMS over a wide temperature range (5 °C to 80 °C) had been achieved in a NUMP-TDC implemented in a Cyclone 10 LP FPGA.

Field programmable gate array (FPGA)-based TDCs, constructed with off-the-shelf low-cost components, offer a promising practical alternative to conventional application specific integrated circuit (ASIC)-based TDCs. Most high-resolution FPGA-based TDCs are constructed using carry chains distributed nor sorted in time order. Thus, as shown in Figure 1b, a clock sorting module is designed to sort the clocks. The rising edges and the falling edges will be captured both, and their values will be used to calculate the fine timestamp in the decode module.
The other modules of the NUMP TDC will be introduced with the temperature correction module later.

Temperature Characteristics of Delay Chain
In previous work, Pan et al. [18] used a complementary metal oxide semiconductor (CMOS) inverter as an example to qualitatively analyze the effect of temperature on the delay chain of Cyclone II FPGA devices and suggested that carrier mobility and threshold voltage affected the propagation delay. A series of experiments were performed, and results indicated that the temperature fluctuations had similar effects on all bins in a delay chain with a limited length (less than 105 delay units). Although Cyclone 10 LP FPGA is a newer product in the Cyclone series FPGA, its positioning and underlying logical architecture are similar to the Cyclone II FPGA. The logic array block (LAB) A key feature of the NUMP TDC is that it is "non-uniform", which means that the phase relationships between the multiple clocks that are output from the PSCG are neither uniformly distributed nor sorted in time order. Thus, as shown in Figure 1b, a clock sorting module is designed to sort the clocks. The rising edges and the falling edges will be captured both, and their values will be used to calculate the fine timestamp in the decode module.
The other modules of the NUMP TDC will be introduced with the temperature correction module later.

Temperature Characteristics of Delay Chain
In previous work, Pan et al. [18] used a complementary metal oxide semiconductor (CMOS) inverter as an example to qualitatively analyze the effect of temperature on the delay chain of Cyclone II FPGA devices and suggested that carrier mobility and threshold voltage affected the propagation delay. A series of experiments were performed, and results indicated that the temperature fluctuations had similar effects on all bins in a delay chain with a limited length (less than 105 delay units). Although Cyclone 10 LP FPGA is a newer product in the Cyclone series FPGA, its positioning and underlying logical architecture are similar to the Cyclone II FPGA. The logic array block (LAB) consists of logic elements (LEs) and has the same carry chain structure [29,30]. Therefore, their delay chains have similar temperature characteristics. The detail about the delay chain has been discussed further in Section 5.
The principle of the NUMP TDC is quite different from a wave-union TDC [15]. Therefore, the conclusion reached in the previous study could not be directly used to support the temperature correction method for this paper. Based on Pan's research, a further conjecture could be proposed: since the variation in propagation delay of each delay unit (hereinafter called a 'cell') is roughly the same when the temperature changes, if the input of a delay chain is a clock signal instead of a hit signal, the delay at the delay chain's end cell is always longer than at the delay chain's front end cell, since the delay effect gradually accumulates in each cell. As the number of cells increases, it is obvious that the propagation delay of the clock signals will increase linearly.
Experimental studies were performed to validate this hypothesis. The clock sorting operation was performed repeatedly, with the temperature changed from 10 • C to 70 • C at 10 • C per step. The results of the clock sorting operation measured at 10 • C were selected as the references. The measurements at other temperatures were subtracted from the references to obtain the changes of the propagation delays (hereinafter called the 'variation of edge values') for each cell. Figure 2 shows the variation of the edge values measured at temperatures ranged from 20 • C to 70 • C at 10 • C per step.
Sensors 2020, 20, x FOR PEER REVIEW 4 of 21 consists of logic elements (LEs) and has the same carry chain structure [29,30]. Therefore, their delay chains have similar temperature characteristics. The detail about the delay chain has been discussed further in Section 5. The principle of the NUMP TDC is quite different from a wave-union TDC [15]. Therefore, the conclusion reached in the previous study could not be directly used to support the temperature correction method for this paper. Based on Pan's research, a further conjecture could be proposed: since the variation in propagation delay of each delay unit (hereinafter called a 'cell') is roughly the same when the temperature changes, if the input of a delay chain is a clock signal instead of a hit signal, the delay at the delay chain's end cell is always longer than at the delay chain's front end cell, since the delay effect gradually accumulates in each cell. As the number of cells increases, it is obvious that the propagation delay of the clock signals will increase linearly.
Experimental studies were performed to validate this hypothesis. The clock sorting operation was performed repeatedly, with the temperature changed from 10 °C to 70 °C at 10 °C per step. The results of the clock sorting operation measured at 10 °C were selected as the references. The measurements at other temperatures were subtracted from the references to obtain the changes of the propagation delays (hereinafter called the 'variation of edge values') for each cell. Figure 2 shows the variation of the edge values measured at temperatures ranged from 20 °C to 70 °C at 10 °C per step. As shown in Figure 2, the edge values of each cell for FPGA at different temperatures were different from the edge values obtained by the clock sorting operation at 10 °C. Therefore, the temperature of the FPGA (more precisely, the delay chain) could be measured using the variation in edge values. Figure 3 shows the variation in delay increment with FPGA temperature, which was the slope of the change in Figure 2. This graph had excellent linearity with an R 2 value that reached 0.998. The slope of the approximated line was 0.027, meaning that for each increase (or decrease) temperature by 1 °C, the propagation delay variation would increase (or decrease) by 0.054 ps (0.027 × 2). Although this linear relationship existed between the propagation delay and the temperature, this relationship was not a prerequisite for the temperature correction method proposed in this paper. As shown in Figure 2, the edge values of each cell for FPGA at different temperatures were different from the edge values obtained by the clock sorting operation at 10 • C. Therefore, the temperature of the FPGA (more precisely, the delay chain) could be measured using the variation in edge values. Figure 3 shows the variation in delay increment with FPGA temperature, which was the slope of the change in Figure 2. This graph had excellent linearity with an R 2 value that reached 0.998. The slope of the approximated line was 0.027, meaning that for each increase (or decrease) temperature by 1 • C, the propagation delay variation would increase (or decrease) by 0.054 ps (0.027 × 2). Although this linear relationship existed between the propagation delay and the temperature, this relationship was not a prerequisite for the temperature correction method proposed in this paper.

Temperature Measurement Structure
The basic concept of the proposed temperature correction method for the NUMP TDC is as follows. A code density test is used to test the edge values of one or more cell(s) in the delay chain, and the results are compared with the edge values stored in the LUT. If the edge value variation exceeds a specified temperature correction threshold, the NUMP TDC switches to temperature correction mode, and the original LUT is updated to consider the current temperature state.
The FPGA delay chain temperatures could be measured using two methods, as shown in Figure 4.  For the first method, the temperature states are monitored using dedicated registers, as shown in Figure 4a. The advantage of this method is that the results accurately reflect the temperature states of the delay chain. However, this method has an obvious shortcoming: the delay chain states cannot sample simultaneously by the hit signal and the resort signal, so a timer is required to periodically test the temperature. This inevitably results in a larger dead time.
The second method is shown in Figure 4b and uses thermometric registers besides the delay chain to monitor variations in the edge values of the thermometric cells. This allows the temperature

Temperature Measurement Structure
The basic concept of the proposed temperature correction method for the NUMP TDC is as follows. A code density test is used to test the edge values of one or more cell(s) in the delay chain, and the results are compared with the edge values stored in the LUT. If the edge value variation exceeds a specified temperature correction threshold, the NUMP TDC switches to temperature correction mode, and the original LUT is updated to consider the current temperature state.
The FPGA delay chain temperatures could be measured using two methods, as shown in Figure 4.
Sensors 2020, 20, x FOR PEER REVIEW 5 of 21 Figure 3. Variation in delay increment with a field-programmable gate array (FPGA) temperature.

Temperature Measurement Structure
The basic concept of the proposed temperature correction method for the NUMP TDC is as follows. A code density test is used to test the edge values of one or more cell(s) in the delay chain, and the results are compared with the edge values stored in the LUT. If the edge value variation exceeds a specified temperature correction threshold, the NUMP TDC switches to temperature correction mode, and the original LUT is updated to consider the current temperature state.
The FPGA delay chain temperatures could be measured using two methods, as shown in Figure 4.  For the first method, the temperature states are monitored using dedicated registers, as shown in Figure 4a. The advantage of this method is that the results accurately reflect the temperature states of the delay chain. However, this method has an obvious shortcoming: the delay chain states cannot sample simultaneously by the hit signal and the resort signal, so a timer is required to periodically test the temperature. This inevitably results in a larger dead time.
The second method is shown in Figure 4b and uses thermometric registers besides the delay chain to monitor variations in the edge values of the thermometric cells. This allows the temperature For the first method, the temperature states are monitored using dedicated registers, as shown in Figure 4a. The advantage of this method is that the results accurately reflect the temperature states of the delay chain. However, this method has an obvious shortcoming: the delay chain states cannot sample simultaneously by the hit signal and the resort signal, so a timer is required to periodically test the temperature. This inevitably results in a larger dead time.
The second method is shown in Figure 4b and uses thermometric registers besides the delay chain to monitor variations in the edge values of the thermometric cells. This allows the temperature Sensors 2020, 20, 2172 6 of 21 state of the delay chain to be monitored at any time. The biggest advantage of this method is that it is independent of time measurement and temperature measurement. The thermometric registers are not as precise as the dedicated registers and may not be as accurate at capturing the delay chain temperature. However, this difference can be minimized by using a logic lock tool in the design platform. By deploying the thermometric registers around the delay chain, the clock states in the delay chain can be almost simultaneously fed into two types of registers. This makes it possible to use additional thermometric registers rather than dedicated delay chain registers to monitor the delay chain temperature. Thus, for our proposed temperature correction module, the second method was utilized. Figure 5 provides a diagram of the NUMP TDC with the temperature correction module. This diagram has two major differences from a conventional NUMP TDC. Firstly, two additional thermometric registers are deployed at both ends of the delay chain to monitor variations in the edge values. Secondly, the LUT is updated in real-time to compensate for temperature effects on the delay chain.

Top-Level Diagram of NUMP TDC with Temperature Correction Module
Sensors 2020, 20, x FOR PEER REVIEW 6 of 21 state of the delay chain to be monitored at any time. The biggest advantage of this method is that it is independent of time measurement and temperature measurement. The thermometric registers are not as precise as the dedicated registers and may not be as accurate at capturing the delay chain temperature. However, this difference can be minimized by using a logic lock tool in the design platform. By deploying the thermometric registers around the delay chain, the clock states in the delay chain can be almost simultaneously fed into two types of registers. This makes it possible to use additional thermometric registers rather than dedicated delay chain registers to monitor the delay chain temperature. Thus, for our proposed temperature correction module, the second method was utilized. Figure 5 provides a diagram of the NUMP TDC with the temperature correction module. This diagram has two major differences from a conventional NUMP TDC. Firstly, two additional thermometric registers are deployed at both ends of the delay chain to monitor variations in the edge values. Secondly, the LUT is updated in real-time to compensate for temperature effects on the delay chain. The NUMP TDC with the temperature correction module has three operation modes. In the initialization clock sorting mode, a calibration signal CAL is selected to latch the clock states in the registers, and the sampling results are calibrated by the clock sorting module, which is based on the internal netware input/output subsystem (NIOS) CPU of the FPGA. The phases (the edge values) of each cell are calibrated, sorted, and stored in the LUT. Once this step is complete, the time intervals can be measured by the NUMP TDC. The NUMP TDC with the temperature correction module has three operation modes. In the initialization clock sorting mode, a calibration signal CAL is selected to latch the clock states in the registers, and the sampling results are calibrated by the clock sorting module, which is based on the internal netware input/output subsystem (NIOS) CPU of the FPGA. The phases (the edge values) of each cell are calibrated, sorted, and stored in the LUT. Once this step is complete, the time intervals can be measured by the NUMP TDC.

Top-Level Diagram of NUMP TDC with Temperature Correction Module
In normal operation mode, clock states with random phase shifts are sampled simultaneously by the hit signal. The sampled states of the clocks are sent to the decoding module to calculate the fine timestamp. The hit signal is also fed to the coarse time module, which outputs a 16-bit course timestamp and a check bit to correct the metastable error. The fine timestamp and the coarse timestamp with its check bit are then combined and buffered in the first input first output (FIFO) buffer and transmitted to the host PC by USB cable. Simultaneously, the thermometric clock sorting module periodically measures the edge values of the thermometric cells. The variation in edge values is fed to the temperature correction start module (hereinafter called the 'TC start module') for analysis. The NUMP TDC will automatically switch to temperature correction mode if the variation is larger than a given threshold.
In the temperature correction mode, the edge values are corrected. The edge values of each cell are calibrated and sorted, and the LUT is updated. Once the process is completed, the TDC switches back to the normal operation mode.

Top-Level Diagram of Temperature Correction Module
A diagram of the temperature correction module is shown in Figure 6. In normal operation mode, clock states with random phase shifts are sampled simultaneously by the hit signal. The sampled states of the clocks are sent to the decoding module to calculate the fine timestamp. The hit signal is also fed to the coarse time module, which outputs a 16-bit course timestamp and a check bit to correct the metastable error. The fine timestamp and the coarse timestamp with its check bit are then combined and buffered in the first input first output (FIFO) buffer and transmitted to the host PC by USB cable. Simultaneously, the thermometric clock sorting module periodically measures the edge values of the thermometric cells. The variation in edge values is fed to the temperature correction start module (hereinafter called the 'TC start module') for analysis. The NUMP TDC will automatically switch to temperature correction mode if the variation is larger than a given threshold.
In the temperature correction mode, the edge values are corrected. The edge values of each cell are calibrated and sorted, and the LUT is updated. Once the process is completed, the TDC switches back to the normal operation mode.

Top-Level Diagram of Temperature Correction Module
A diagram of the temperature correction module is shown in Figure 6. Since temperature variations tend to occur very slowly, it is not necessary to continuously monitor the temperature. The timer is used for periodic measurement of the delay chain temperatures using the thermometric clock sorting module. After each measurement, the absolute value of the difference between the edge values of the thermometric cells is fed to the TC start module for analysis of the current temperatures. The analysis process is as follows. If there is no temperature change trend identified, the TDC will remain in normal operation mode. If a temperature change trend is identified, the module will check if the change is larger than the given threshold. If so, the TDC switches to temperature correction mode, i.e., temperature correction will be launched, and the time measurement process will be suspended. Otherwise, the TDC will remain in normal operation mode.
Once the temperature correction starts, the edge values in LUT will be individually corrected by the LUT correction module. This disturbs the original order of the edge values. Therefore, a LUT reorder module is introduced to reorder the edge values in the LUT after each correction. Once the reorder process is completed, and the entire temperature correction process is finished, the TDC switches back to the normal operation mode.
The temperature correction module is performed in real-time and online. Since temperature variations tend to occur very slowly, it is not necessary to continuously monitor the temperature. The timer is used for periodic measurement of the delay chain temperatures using the thermometric clock sorting module. After each measurement, the absolute value of the difference between the edge values of the thermometric cells is fed to the TC start module for analysis of the current temperatures. The analysis process is as follows. If there is no temperature change trend identified, the TDC will remain in normal operation mode. If a temperature change trend is identified, the module will check if the change is larger than the given threshold. If so, the TDC switches to temperature correction mode, i.e., temperature correction will be launched, and the time measurement process will be suspended. Otherwise, the TDC will remain in normal operation mode.
Once the temperature correction starts, the edge values in LUT will be individually corrected by the LUT correction module. This disturbs the original order of the edge values. Therefore, a LUT Sensors 2020, 20, 2172 8 of 21 reorder module is introduced to reorder the edge values in the LUT after each correction. Once the reorder process is completed, and the entire temperature correction process is finished, the TDC switches back to the normal operation mode.
The temperature correction module is performed in real-time and online.

Clock Sorting Operation for Temperature Measurement
This paper proposed a simple method to monitor the temperature of delay chains. Two thermometric registers are deployed at both ends of the delay chains, and a thermometric clock sorting operation is carried out periodically to obtain the edge values of the thermometric cells. The variation in edge values is calculated and fed into the TC start module to determine whether temperature correction should be performed.
The thermometric clock sorting operation is based on the code density test [13]. Two signals are used during this process-the reference signal C 0 and the signal to be measured C X -which is the thermometric cell clock. The frequency of the resort signal is 6.7821 MHz, which is not related to the system clock, which is 400 MHz. Thus, the probability for each cell to be sampled is the same. A large number of random samples are performed to evenly distribute the resort signals at various positions across the whole clock cycle. The proportion of measurements is calculated between C 0 and C X for each of the four types "10", "11", "01", and "00". The results are multiplied by the clock cycle to obtain the edge values of C X .
There are always two possible relationships between C 0 and C X . If the ratio between the four cases is 0.2, 0.2, 0.3, and 0.3, the rising and falling edge positions will be restricted to the two cases shown in Figure 7.

Clock Sorting Operation for Temperature Measurement
This paper proposed a simple method to monitor the temperature of delay chains. Two thermometric registers are deployed at both ends of the delay chains, and a thermometric clock sorting operation is carried out periodically to obtain the edge values of the thermometric cells. The variation in edge values is calculated and fed into the TC start module to determine whether temperature correction should be performed.
The thermometric clock sorting operation is based on the code density test [13]. Two signals are used during this process-the reference signal C0 and the signal to be measured CX-which is the thermometric cell clock. The frequency of the resort signal is 6.7821 MHz, which is not related to the system clock, which is 400 MHz. Thus, the probability for each cell to be sampled is the same. A large number of random samples are performed to evenly distribute the resort signals at various positions across the whole clock cycle. The proportion of measurements is calculated between C0 and CX for each of the four types "10", "11", "01", and "00". The results are multiplied by the clock cycle to obtain the edge values of CX.
There are always two possible relationships between C0 and CX. If the ratio between the four cases is 0.2, 0.2, 0.3, and 0.3, the rising and falling edge positions will be restricted to the two cases shown in Figure 7. For the initialization clock sorting operation, CAL_1 is used to obtain two possible relationships between the reference clock C0 and any other clock CX. The role of CAL_2 is to distinguish between these two cases and is not required in temperature correction mode. By transmitting the results of clock sorting operation to the host computer for analysis, the phase relationship between each thermometric cell's signal CX and the reference signal C0 can be confirmed clearly.
It should be noted that the thermometric cells at both ends of the delay chains referred to in this paper do not represent the first and last cells in the absolute sense, but rather the cells at the start and end positions of the delay chain. In order to save computing resources and time of the whole correction process, the cell with a number that is a power of 2 can be considered as the thermometric cell in priority. For example, the 64th cell can be selected if there are 95 cells in each delay chain. Figure 8 shows the clock signal of the 64th cell at TDC temperatures of 10 °C and 70 °C. As the temperature increases from 10 °C and 70 °C, the value of the rising edge changes from 363 ps to 574 ps, i.e., a 211 ps edge increment occurs, which accounts for a delay of 8.44% of the total clock cycle of 2500 ps. For the initialization clock sorting operation, CAL_1 is used to obtain two possible relationships between the reference clock C 0 and any other clock C X . The role of CAL_2 is to distinguish between these two cases and is not required in temperature correction mode. By transmitting the results of clock sorting operation to the host computer for analysis, the phase relationship between each thermometric cell's signal C X and the reference signal C 0 can be confirmed clearly.
It should be noted that the thermometric cells at both ends of the delay chains referred to in this paper do not represent the first and last cells in the absolute sense, but rather the cells at the start and end positions of the delay chain. In order to save computing resources and time of the whole correction process, the cell with a number that is a power of 2 can be considered as the thermometric cell in priority. For example, the 64th cell can be selected if there are 95 cells in each delay chain. Figure 8 shows the clock signal of the 64th cell at TDC temperatures of 10 • C and 70 • C. As the temperature increases from 10 • C and 70 • C, the value of the rising edge changes from 363 ps to 574 ps, i.e., a 211 ps edge increment occurs, which accounts for a delay of 8.44% of the total clock cycle of 2500 ps. This variation in edge values due to changes in temperature is small relative to the overall clock cycle. Therefore, in contrast with the thermometric clock sorting operation module, it is not necessary to count the proportions of each of the four cases, as counting a single case will be sufficient. This greatly reduces programming complexity and saves FPGA resources. For example, to calculate the variation in the rising edge of the 64th cell, we only need to count the frequency of "10" cases. The frequency of each case is divided by the total number and multiplied by the system clock cycle to get the edge values at the current temperature. After the initialization clock sorting operation is completed, the thermometric clock sorting operation starts after one timer period. For any specific temperature, the variation in edge value can be represented by Equation (1) as: where n is the thermometric cell number, * is the rising/falling edge value when the temperature varies, and is the original rising/falling edge value. A positive value of ∆ indicates that the current temperature of the delay chain is higher than the original temperature and the temperature rise mark bit is asserted. A negative value of ∆ indicates that the current temperature of the delay chain is lower than the original temperature and the temperature drop mark bit is asserted. Both ∆ and the mark bit are sent to the TC start module to determine the temperature trend.

Temperature Correction Start Module
The TC start module operates as follows. It firstly assesses whether the current temperature of the delay chain has increased, decreased, or is substantially unchanged compared to the previous temperature. If there is a clear and strong temperature trend, temperature correction is launched. Otherwise, TDC remains in normal operation mode.
For accurate temperature trend measurement, the temperature measurement frequency, which is controlled by a timer, should be set to a suitable frequency. However, a temperature measurement frequency that is too high will increase the randomness of the temperature measurement, which may increase the probability of incorrectly identifying a temperature trend. Therefore, strict trend judgment criteria are required. In this paper, two counters, named rise_cnt and fall_cnt, were used to determine the temperature trend. Both counters started at zero and were incremented each time its corresponding temperature mark bit was asserted. The difference and sum of rise_cnt and fall_cnt were both calculated. Before the sum of the two counters reached a pre-defined value, if the difference between the two counters was greater than half this value, a temperature trend was identified with the larger counter, indicating whether the temperature was rising or falling.
Once it is evident that there is a temperature trend, a threshold judgment is performed. If Equation (2) is true, the temperature correction process is launched, and the LUT correction module and the LUT reorder module are sequentially triggered to correct and sort edge values and store the new values in the LUT. This variation in edge values due to changes in temperature is small relative to the overall clock cycle. Therefore, in contrast with the thermometric clock sorting operation module, it is not necessary to count the proportions of each of the four cases, as counting a single case will be sufficient. This greatly reduces programming complexity and saves FPGA resources. For example, to calculate the variation in the rising edge of the 64th cell, we only need to count the frequency of "10" cases. The frequency of each case is divided by the total number and multiplied by the system clock cycle to get the edge values at the current temperature. After the initialization clock sorting operation is completed, the thermometric clock sorting operation starts after one timer period. For any specific temperature, the variation in edge value can be represented by Equation (1) as: where n is the thermometric cell number, edge * n is the rising/falling edge value when the temperature varies, and edge n is the original rising/falling edge value.
A positive value of ∆edge indicates that the current temperature of the delay chain is higher than the original temperature and the temperature rise mark bit is asserted. A negative value of ∆edge indicates that the current temperature of the delay chain is lower than the original temperature and the temperature drop mark bit is asserted. Both ∆edge and the mark bit are sent to the TC start module to determine the temperature trend.

Temperature Correction Start Module
The TC start module operates as follows. It firstly assesses whether the current temperature of the delay chain has increased, decreased, or is substantially unchanged compared to the previous temperature. If there is a clear and strong temperature trend, temperature correction is launched. Otherwise, TDC remains in normal operation mode.
For accurate temperature trend measurement, the temperature measurement frequency, which is controlled by a timer, should be set to a suitable frequency. However, a temperature measurement frequency that is too high will increase the randomness of the temperature measurement, which may increase the probability of incorrectly identifying a temperature trend. Therefore, strict trend judgment criteria are required. In this paper, two counters, named rise_cnt and fall_cnt, were used to determine the temperature trend. Both counters started at zero and were incremented each time its corresponding temperature mark bit was asserted. The difference and sum of rise_cnt and fall_cnt were both calculated. Before the sum of the two counters reached a pre-defined value, if the difference between the two counters was greater than half this value, a temperature trend was identified with the larger counter, indicating whether the temperature was rising or falling.
Once it is evident that there is a temperature trend, a threshold judgment is performed. If Equation (2) is true, the temperature correction process is launched, and the LUT correction module and the LUT reorder module are sequentially triggered to correct and sort edge values and store the new values in the LUT.
where ∆edge tail and ∆edge van are the variations of the edge values of the thermometric cells matched with the thermometric registers, and threshold is a pre-defined temperature correction threshold. Note that once temperature correction is launched, the LUT used for the decoding process must be updated, so time measurement cannot be performed during the temperature correction process and must wait until it has been completed. The introduction of a threshold prevents the TDC from performing unnecessary temperature corrections for temperature fluctuations within a small range. The threshold is a parameter that should be determined by executing the clock sorting operation repeatedly while the ambient temperature varies and can be set according to the FPGA temperature adaptability and the working environment.

LUT Correction Module
Since the delay chain has a cumulative effect on the delay of the input clock signal, the delay effect of temperature on the delay chain also has a cumulative effect, i.e., cells at the back of the delay chain will occur larger edge value variation than the cells at the front of the delay chain. The main task of the LUT correction module is to correct the edge values according to the position of the cells in the delay chain.
For example, for the first sub-delay chain, the correction formula will be as follows: where num n is the number of the nth cell of the delay chain, which is equal to n, num van and num tail are the numbers of the thermometric cells, edge i is the current edge value, and edge * i is the edge value after correction. Each cell has a rising and falling edge value, so there will be twice as many edge values as cells, i.e., i is equal to 2n or 2n − 1. The effect of temperature on the delay chain considers each cell as a minimum element, so the same correction parameters are used to correct both types of edges for the same cell. The other three sub-delay chains have a similar correction process to the first sub-delay chain. Each cell is corrected using the same correction parameters for that cell position, e.g., the edge values of the last cell of the second sub-delay chain are corrected using the same correction parameters as the last cell of the first sub-delay chain.
Because of the cross-clock cycle phenomenon, edge * i need to be checked after corrected by Formula (3). If edge * i is larger than the system clock cycle, the real edge * i is equal to its value less than the system clock cycle. If edge * i is smaller than 0, the real edge * i is equal to its value plus the system clock cycle. The cross-clock cycle phenomenon has been discussed further in Section 3.5.
In addition, to avoid repeatedly triggering temperature corrections at the same temperature, the parameter edge n in Formula (1) should be updated by reading the corrected LUT after each LUT correction.

LUT Reorder Module
After executing the initialization clock sorting operation and obtaining the edge values of the clock signal corresponding to each cell, the edge values are sorted and stored for each cell number in LUT. Once the length of the delay chain reaches a certain value, the delay to the input clock signal will be large enough to result in a cross-clock cycle phenomenon. Therefore, there may be multiple cells with similar phases in the delay chain, as shown in Figure 9. As shown in Figure 9a, in its original state, the rising edge value of CN is slightly smaller than C2. Therefore, the rising edge value of CN must be placed before the rising edge value of C2 during the sorting process, before being stored in the LUT. When the temperature rises, the edge increment of the delay chain's back end cells must be larger than the edge increment of the front end cells due to the cumulative effect of the delay. Therefore, after performing edge correction, the rising edge value of CN will be larger than the rising edge value of C2. In contrast, as shown in Figure 9b, if the edge value is very close to the clock cycle, then as the temperature rises, the cross-clock cycle phenomenon will occur. (Note the situation when the temperature drop is similar but has not been described here). As a result, the order of the edge values in the original LUT will be disturbed. Therefore, the LUT reorder module should be introduced.
The FPGA sort operation is more complicated than a software sort, as the use of arrays requires a high logic resource consumption. Additionally, the design platform may get trapped in the analysis and synthesis process due to the use of the large array and the loop structure.
Therefore, pairwise comparison is adopted to construct the reorder module. Each time two edge values are read from the LUT, the two adjacent edge values are compared, and the larger value is retained. Once the pairwise comparison is completed, the larger value is saved. The new and the old larger values are further compared, and the larger value is retained as the alternative to the biggest edge value for the current round of comparison. The alternative biggest edge value is constantly updated for each pairwise comparison. When all the edge values in LUT have been compared, the alternative biggest edge value (the biggest edge value in the current round of comparison) is placed in the storage address of the last read data value. In order to prevent data loss, the data originally at this address is stored in the address of the biggest edge value. In summary, each round of comparison will identify the biggest edge value for that round and store that value and its cell number at the end of the LUT. The edge value and cell number, which were previously stored at this position, will be stored in the original address of the biggest edge value. As shown in Figure 9a, in its original state, the rising edge value of C N is slightly smaller than C 2 . Therefore, the rising edge value of C N must be placed before the rising edge value of C 2 during the sorting process, before being stored in the LUT. When the temperature rises, the edge increment of the delay chain's back end cells must be larger than the edge increment of the front end cells due to the cumulative effect of the delay. Therefore, after performing edge correction, the rising edge value of C N will be larger than the rising edge value of C 2 . In contrast, as shown in Figure 9b, if the edge value is very close to the clock cycle, then as the temperature rises, the cross-clock cycle phenomenon will occur. (Note the situation when the temperature drop is similar but has not been described here). As a result, the order of the edge values in the original LUT will be disturbed. Therefore, the LUT reorder module should be introduced.
The FPGA sort operation is more complicated than a software sort, as the use of arrays requires a high logic resource consumption. Additionally, the design platform may get trapped in the analysis and synthesis process due to the use of the large array and the loop structure.
Therefore, pairwise comparison is adopted to construct the reorder module. Each time two edge values are read from the LUT, the two adjacent edge values are compared, and the larger value is retained. Once the pairwise comparison is completed, the larger value is saved. The new and the old larger values are further compared, and the larger value is retained as the alternative to the biggest edge value for the current round of comparison. The alternative biggest edge value is constantly updated for each pairwise comparison. When all the edge values in LUT have been compared, the alternative biggest edge value (the biggest edge value in the current round of comparison) is placed in the storage address of the last read data value. In order to prevent data loss, the data originally at this address is stored in the address of the biggest edge value. In summary, each round of comparison will identify the biggest edge value for that round and store that value and its cell number at the end of the LUT. The edge value and cell number, which were previously stored at this position, will be stored in the original address of the biggest edge value.
In order to illustrate this process clearly, information on the first 10 edges was extracted from a real LUT so that the comparison process could be explained in detail. The edge value units were picosecond. The first round of comparison is shown in Table 1. The cell number 346 and its edge value 92 were retained in the 10th memory address. The edge values stored in the first 9 memory addresses would be sorted by the second round of comparison. The updated LUT for the second round of comparison is shown in Table 2. For the second round of comparison, the even memory address (10th) data was missing for the fifth data comparison. Therefore, the edge value of the last even memory address (8th) was read instead and compared with the odd memory address (9th). After the second round of comparison, the cell number 259 and its edge value 53, which were stored in the 4th memory address, were used to replace the cell number 291 and its edge value 36 stored in the 9th memory address. The third round of sorting is shown in Table 3, which would sort the edge values stored in the first 8 memory addresses.  0  46  46  2  2  144  3  46  3  202  5  22  36  36  4  4  291  7  53  5  143  8  52  52  52  5  6  26  10  45  7  201  20  41  41  52  5  8  11  21  33  9  259  --53  --10  346  --92 --Based on this process, the sorting process will not be repeated for each memory location. If there are m edge values, then m − 1 rounds of comparison are required. Once the reorder process is completed, the overall temperature correction process is finished, and the TDC switches back to the normal operation mode.

Experimental Validation
A two-channel NUMP TDC was implemented in a 60 nm Altera Cyclone 10 LP FPGA (10CL120YF780C8G) to validate the proposed temperature correction method. Each channel contained 380 cells to generate 380 delayed 400 MHz clocks.

Experimental Validation
A two-channel NUMP TDC was implemented in a 60 nm Altera Cyclone 10 LP FPGA (10CL120YF780C8G) to validate the proposed temperature correction method. Each channel contained 380 cells to generate 380 delayed 400 MHz clocks.

Variation of Edge Values with Temperature Changes
The initialization clock sorting operation was executed at 40 °C. As the temperature varied, Figure 10 shows the 760 edge values in the LUT with and without the temperature correction module. The ideal edge value distribution was also shown in these graphs for ease of comparison. For discussion purposes, Figure 10a is taken as an example, which showed the effect of temperature rises. The red line in this figure showed that the edge values at the front had larger variations than the edge values at the back. The reason for this was that although the edge values at the front were small, there were a large number of cells match them. Due to the cumulative delay effect, the latter part of the delay chain was more affected by temperature than the front part of the delay chain. This was reflected in the edge values by a higher change in variation. However, the edge values of the cells at the end of the delay chain spanned a whole system clock period due to the rise of temperature, so its edge values were restarted from zero. Therefore, the temperature correction module was required. After LUT correction and reordering was completed, the distribution of the edge values in the LUT was shown by the green line in Figure 10. The offside edges had disappeared, and the LUT could be reused for decoding to calculate the fine timestamp. Figure 10b shows the effect For discussion purposes, Figure 10a is taken as an example, which showed the effect of temperature rises. The red line in this figure showed that the edge values at the front had larger variations than the edge values at the back. The reason for this was that although the edge values at the front were small, there were a large number of cells match them. Due to the cumulative delay effect, the latter part of the delay chain was more affected by temperature than the front part of the delay chain. This was reflected in the edge values by a higher change in variation. However, the edge values of the cells at the end of the delay chain spanned a whole system clock period due to the rise of temperature, so its edge values were restarted from zero. Therefore, the temperature correction module was required. After LUT correction and reordering was completed, the distribution of the edge values in the LUT was shown by the green line in Figure 10. The offside edges had disappeared, and the LUT could be reused for decoding to calculate the fine timestamp. Figure 10b shows the effect of temperature decreases, which was similar to the effect of temperature rises, and is not described here to avoid repetition.

Nonlinearities
TDC nonlinearities would directly affect the resolution of the system. The most commonly-used metrics to characterize TDC nonlinearities are differential nonlinearity (DNL) and integral nonlinearity (INL) [31,32]. DNL describes the difference between each bin width and the ideal bin width, which is also known as the least significant bit (LSB) and reflects the degree of nonlinearity of single bin width.
INL represents the cumulative offset, and its value can be obtained by integrating the DNL values for some conventional TDC.
However, the edges of all the time bins of the NUMP TDC were measured independently. Thus, the DNL errors of the NUMP TDC did not accumulate to produce INL errors like conventional TDCs. In the decoding stage, using the binary search algorithm and the sampled delayed clock states latched by hit signal, the fine timestamp region could be narrowed constantly and located finally when searching two adjacent memory addresses. The mean of the corresponding edge values was calculated as the fine timestamp. Note that the edge values used in the decoding process were confirmed accurately by the initialization clock sorting operation. In conclusion, due to the introduction of the clock sorting module, the bin-by-bin calibration was executed to avoid the negative influence of INL that might be caused by uneven bin sizes. Edge values were sorted and stored in the LUT during the initialization clock sorting mode. In normal operation mode, each fine timestamp measurement result only depended on the two adjacent edge values obtained at the final stage of decoding but was not related to the other edge values in the delay chain. Thus, the NUMP TDC had only DNL because the last bin obtained by the decoding module might be not equal to the LSB and had no INL.
The DNL of the delay chain when the temperature changed from 40 • C to 70 • C and 10 • C are shown in Figure 11a,b, respectively.

Nonlinearities
TDC nonlinearities would directly affect the resolution of the system. The most commonly-used metrics to characterize TDC nonlinearities are differential nonlinearity (DNL) and integral nonlinearity (INL) [31,32]. DNL describes the difference between each bin width and the ideal bin width, which is also known as the least significant bit (LSB) and reflects the degree of nonlinearity of single bin width. INL represents the cumulative offset, and its value can be obtained by integrating the DNL values for some conventional TDC.
However, the edges of all the time bins of the NUMP TDC were measured independently. Thus, the DNL errors of the NUMP TDC did not accumulate to produce INL errors like conventional TDCs. In the decoding stage, using the binary search algorithm and the sampled delayed clock states latched by hit signal, the fine timestamp region could be narrowed constantly and located finally when searching two adjacent memory addresses. The mean of the corresponding edge values was calculated as the fine timestamp. Note that the edge values used in the decoding process were confirmed accurately by the initialization clock sorting operation. In conclusion, due to the introduction of the clock sorting module, the bin-by-bin calibration was executed to avoid the negative influence of INL that might be caused by uneven bin sizes. Edge values were sorted and stored in the LUT during the initialization clock sorting mode. In normal operation mode, each fine timestamp measurement result only depended on the two adjacent edge values obtained at the final stage of decoding but was not related to the other edge values in the delay chain. Thus, the NUMP TDC had only DNL because the last bin obtained by the decoding module might be not equal to the LSB and had no INL.
The DNL of the delay chain when the temperature changed from 40 °C to 70 °C and 10 °C are shown in Figure 11a,b, respectively.  When the temperature rose from 40 • C to 70 • C, and after the temperature correction, the DNL was within [−0.81, 1.20] of LSB. The RMS of the DNL was 0.34 ps. In contrast, the DNL of the initialization clock sorting operation was tested, and the DNL was within [−0.84, 1.48] of LSB. The RMS of the DNL was 0.48 ps. When the temperature dropped from 40 • C to 10 • C, and after the temperature correction, the DNL was within [−0.74, 1.48] of LSB. The RMS of the DNL was 0.29 ps. For the initialization clock sorting operation at 10 • C, the DNL was within [−0.81, 1.01] of LSB. The RMS of the DNL was 0.42 ps. The DNL obtained by the temperature correction module provided a similar level of accuracy as the DNL values obtained by the initialization clock sorting operation, which indicated that the LUT updated by the temperature correction module was practical.

TDC Measurement
As shown in Figure 12, the external pulses generated by an Analog Devices Inc. clock generation board (AD9548/PCBZ) were sent to the FPGA through a low-voltage differential signaling (LVDS) port and fed into the two TDCs. The differences between the two TDC measurements were calculated in the FPGA and transmitted to the PC through a USB cable.
The performances of the two TDCs were characterized by calculating the RMS of the differences between the two TDC measurements. As shown in Figure 13, the TDC resolution reduced significantly due to temperature changes. However, the proposed temperature correction method successfully eliminated the influence of temperature in the range of 5 • C to 80 • C on the TDC resolution.

TDC Measurement
As shown in Figure 12, the external pulses generated by an Analog Devices Inc. clock generation board (AD9548/PCBZ) were sent to the FPGA through a low-voltage differential signaling (LVDS) port and fed into the two TDCs. The differences between the two TDC measurements were calculated in the FPGA and transmitted to the PC through a USB cable. Figure 12. The diagram of TDC measurement. The external test pulses were sent to the FPGA through a low-voltage differential signaling (LVDS) port and fed to the two TDCs.
The performances of the two TDCs were characterized by calculating the RMS of the differences between the two TDC measurements. As shown in Figure 13, the TDC resolution reduced significantly due to temperature changes. However, the proposed temperature correction method successfully eliminated the influence of temperature in the range of 5 °C to 80 °C on the TDC resolution. Figure 13. Single-shot resolutions of TDC for temperature variations from 5 °C to 80 °C. The initialization clock sorting operation was executed at 40 °C. Figure 14 shows a statistic histogram of more than 100,000 measurement results collected during this experiment at different temperatures. This histogram clearly showed that although the temperature changed, the TDC measurement results still showed an excellent Gaussian distribution. The overall TDC resolution was 8.8 ps (RMS), which contained all the error factors, including DNL and jitter. Note that all RMS values given in this paper represented the single-shot resolution, which was 1/√2 of the dual-channel resolution. Figure 12. The diagram of TDC measurement. The external test pulses were sent to the FPGA through a low-voltage differential signaling (LVDS) port and fed to the two TDCs.
As shown in Figure 12, the external pulses generated by an Analog Devices Inc. clock generation board (AD9548/PCBZ) were sent to the FPGA through a low-voltage differential signaling (LVDS) port and fed into the two TDCs. The differences between the two TDC measurements were calculated in the FPGA and transmitted to the PC through a USB cable. Figure 12. The diagram of TDC measurement. The external test pulses were sent to the FPGA through a low-voltage differential signaling (LVDS) port and fed to the two TDCs.
The performances of the two TDCs were characterized by calculating the RMS of the differences between the two TDC measurements. As shown in Figure 13, the TDC resolution reduced significantly due to temperature changes. However, the proposed temperature correction method successfully eliminated the influence of temperature in the range of 5 °C to 80 °C on the TDC resolution.  Figure 14 shows a statistic histogram of more than 100,000 measurement results collected during this experiment at different temperatures. This histogram clearly showed that although the temperature changed, the TDC measurement results still showed an excellent Gaussian distribution. The overall TDC resolution was 8.8 ps (RMS), which contained all the error factors, including DNL and jitter. Note that all RMS values given in this paper represented the single-shot resolution, which was 1/√2 of the dual-channel resolution.  Figure 14 shows a statistic histogram of more than 100,000 measurement results collected during this experiment at different temperatures. This histogram clearly showed that although the temperature changed, the TDC measurement results still showed an excellent Gaussian distribution. The overall TDC resolution was 8.8 ps (RMS), which contained all the error factors, including DNL and jitter. Note that all RMS values given in this paper represented the single-shot resolution, which was 1/ √ 2 of the dual-channel resolution.
Sensors 2020, 20, x FOR PEER REVIEW 16 of 21 Figure 14. Distribution of the differences between the two TDC measurements in the temperature range from 5 °C to 80 °C. Automatic temperature corrections were applied, and the overall TDC resolution was 8.8 ps.

Correction Parameters of Temperature Correction Module
In our design, four sub-delay chains of the same channel were corrected simultaneously. This method had a potential problem: the four sub-delay chains of the same channel might not show Figure 14. Distribution of the differences between the two TDC measurements in the temperature range from 5 • C to 80 • C. Automatic temperature corrections were applied, and the overall TDC resolution was 8.8 ps.

Correction Parameters of Temperature Correction Module
In our design, four sub-delay chains of the same channel were corrected simultaneously. This method had a potential problem: the four sub-delay chains of the same channel might not show similar delay characteristics for temperature changes. In other words, when a phase shift occurred in one sub-delay chain, which was sufficient to trigger a temperature correction, it was not clear whether the other three sub-delay chains would change to the same extent.
To examine this issue further, the linear fitting results of delay chain_1 to delay chain_4 are listed in Table 4, which were similar to the equations shown in Figure 2. Furthermore, based on the equations shown in Table 4, the linear fitting results of slopes are shown in Table 5, which were similar to the equations shown in Figure 3. As shown in Table 4, the four sub-delay chains had similar delay characteristics, and the slopes of the fitted functions were only slightly different. ∆K MAX in Table 4 was used to set Delay chain_1 as the reference. Although the intercepts were quite different, the correction process only used the correction parameters related to each cell number and added the original edge values. Thus, the correction effect was not related to the intercepts. As shown in Table 4, the maximum difference in slope was 0.032 when the temperature varied from 10 • C to 70 • C. At 70 • C, the maximum correction error was 0.032 × 95 = 3.04 ps, which was smaller than the LSB (3.2895 ps).
As shown in Table 5, the slope of the fitted equations varied between 0.027 to 0.028, which further indicated that the four sub-delay chains had similar delay characteristics across a wide temperature range. Each cell's delay would increase (or decrease) by 0.054 ps to 0.056 ps when the temperature rose (or drops) by one degree Celsius.

Carry Chain Differences between Cyclone V and Cyclone 10 LP
The NUMP TDC that we proposed was based on the Altera 28 nm Cyclone V FPGA (5CEBA4F23C7N) and implemented with 400 delayed 500 MHz clocks, where each sub-delay chain had 100 cells. NUMP temperature stability was also tested. The NUMP TDC based on the Cyclone V FPGA was not sensitive to temperature fluctuations. The resolution of the TDC without temperature correction was only slightly different and changed from 5.4 ps to 7.3 ps when the temperature was increased from 20 • C to 56 • C [28]. In contrast, the NUMP TDC based on the Cyclone 10 LP FPGA was very sensitive to temperature fluctuations. As could be seen from the experimental results in Figure 12, the resolution of the NUMP TDC without temperature correction was dramatically changed from 7.2 ps to 43.4 ps when the temperature increased from 40 • C to 80 • C.
The 60 nm Cyclone 10 LP FPGAs are the new generation of 28 nm Cyclone V FPGAs due to their unique advantages, including low power consumption and low cost. These FPGAs also differ in terms of their underlying logical resources [29,33]. Each logic array block (LAB) of a Cyclone V FPGA consists of 10 adaptive logic blocks (ALMs), whereas a Cyclone 10 LP FPGA has 16 logical elements (LEs). In the TDC field, the main difference between the two FPGAs is reflected in the carry chain processing, which is a constituent part of the delay chain. For the Cyclone V FPGA, the carry chain consists of "dedicated full adders". However, in Cyclone 10 LP FPGA, the carry chain is formed by the LUT. Figure 15 shows these two distinct carry chain processing methods.
(5CEBA4F23C7N) and implemented with 400 delayed 500 MHz clocks, where each sub-delay chain had 100 cells. NUMP temperature stability was also tested. The NUMP TDC based on the Cyclone V FPGA was not sensitive to temperature fluctuations. The resolution of the TDC without temperature correction was only slightly different and changed from 5.4 ps to 7.3 ps when the temperature was increased from 20 °C to 56 °C [28]. In contrast, the NUMP TDC based on the Cyclone 10 LP FPGA was very sensitive to temperature fluctuations. As could be seen from the experimental results in Figure 12, the resolution of the NUMP TDC without temperature correction was dramatically changed from 7.2 ps to 43.4 ps when the temperature increased from 40 °C to 80 °C.
The 60 nm Cyclone 10 LP FPGAs are the new generation of 28 nm Cyclone V FPGAs due to their unique advantages, including low power consumption and low cost. These FPGAs also differ in terms of their underlying logical resources [29,33]. Each logic array block (LAB) of a Cyclone V FPGA consists of 10 adaptive logic blocks (ALMs), whereas a Cyclone 10 LP FPGA has 16 logical elements (LEs). In the TDC field, the main difference between the two FPGAs is reflected in the carry chain processing, which is a constituent part of the delay chain. For the Cyclone V FPGA, the carry chain consists of "dedicated full adders". However, in Cyclone 10 LP FPGA, the carry chain is formed by the LUT. Figure 15 shows these two distinct carry chain processing methods.
It was evident that the Cyclone V FPGA and Cyclone 10 LP FPGA had different carry chain structures. Although the difference was not obvious in terms of the propagation delay, there were huge temperature stability differences, which might result in a large difference in temperature characteristics for the same TDC scheme implemented on different devices.

Resource Usage and Power Consumption
The resources used in NUMP TDC with temperature correction majorly included logic cells, logic registers memory bits, phase-locked loops (PLLs), and routing resources. The logic utilizations of NIOS CPU, NUMP TDC, and temperature correction module are compared in Table 6. Note that multiple TDC channels could share one temperature correction module, and one NIOS CPU could It was evident that the Cyclone V FPGA and Cyclone 10 LP FPGA had different carry chain structures. Although the difference was not obvious in terms of the propagation delay, there were huge temperature stability differences, which might result in a large difference in temperature characteristics for the same TDC scheme implemented on different devices.

Resource Usage and Power Consumption
The resources used in NUMP TDC with temperature correction majorly included logic cells, logic registers memory bits, phase-locked loops (PLLs), and routing resources. The logic utilizations of NIOS CPU, NUMP TDC, and temperature correction module are compared in Table 6. Note that multiple TDC channels could share one temperature correction module, and one NIOS CPU could support all the TDC channels in the FPGA. Theoretically, 40 to 70 channels of NUMP TDCs could be implemented with temperature correction in a single Cyclone 10 LP FPGA.
The consumption of a TDC was determined by both the resource usages and the event rate. The power consumption of the NUMP TDC system worked at an event rate of 1 MHz was calculated using the Cyclone 10 LP Early Power Estimator. The results showed the delay chains operated with a clock frequency of 400 MHz dominated the power consumption of the NUMP TDC (39 mW). The power consumptions of the NIOS CPU, a single channel of NUMP TDC, and the temperature correction module were 58 mW, 40 mW, and 1 mW, respectively.

Features of the Temperature Correction Method
The temperature correction method proposed in this paper has some unique features compared to many of the conventional temperature correction methods.
Firstly, it supports asynchronous temperature correction between multiple channels. Although the overall FPGA chip will have the same ambient temperature, the temperature states of the individual channels may not be identical, so the temperature characteristics of the individual delay chains may be different. Therefore, whether the temperature is measured by an external temperature sensor [16,17] or a dedicated delay chain [18], any temperature correction will be triggered for all delay chains, which will inevitably introduce a certain level of error. The temperature correction method proposed in this paper could measure the temperature state of each channel's delay chain and determine whether each channel needs to be corrected, thus providing individualized delay chain correction for each channel. Figure 16 compares the effect of synchronous correction and asynchronous correction through an internal signal test.
implemented with temperature correction in a single Cyclone 10 LP FPGA. The consumption of a TDC was determined by both the resource usages and the event rate. The power consumption of the NUMP TDC system worked at an event rate of 1 MHz was calculated using the Cyclone 10 LP Early Power Estimator. The results showed the delay chains operated with a clock frequency of 400 MHz dominated the power consumption of the NUMP TDC (39 mW). The power consumptions of the NIOS CPU, a single channel of NUMP TDC, and the temperature correction module were 58 mW, 40 mW, and 1 mW, respectively.

Features of the Temperature Correction Method
The temperature correction method proposed in this paper has some unique features compared to many of the conventional temperature correction methods.
Firstly, it supports asynchronous temperature correction between multiple channels. Although the overall FPGA chip will have the same ambient temperature, the temperature states of the individual channels may not be identical, so the temperature characteristics of the individual delay chains may be different. Therefore, whether the temperature is measured by an external temperature sensor [16,17] or a dedicated delay chain [18], any temperature correction will be triggered for all delay chains, which will inevitably introduce a certain level of error. The temperature correction method proposed in this paper could measure the temperature state of each channel's delay chain and determine whether each channel needs to be corrected, thus providing individualized delay chain correction for each channel. Figure 16 compares the effect of synchronous correction and asynchronous correction through an internal signal test.  Secondly, there is no requirement for intensive experiments at an early stage to determine the relationship between the temperature and correction parameters, which is convenient for reproducibility with different FPGA boards. For different production batches using the same series of FPGA, the delay characteristics and temperature characteristics of the delay chain may not be identical due to slight variations in the manufacturing process. The difference in delay characteristics can be fully calibrated during the initialization clock sorting operation. For the temperature sensitivity, to explore the corresponding relationship between the correction parameters and temperature, for other temperature correction methods, a special temperature control device is required to calibrate the correction parameters for multiple temperature states. This process is cumbersome, and the correction parameters cannot be generalized, i.e., the exact same experiments need to be repeated for each FPGA board [16][17][18]. However, these experiments are not necessarily using the temperature correction method proposed in this paper. Although the temperature characteristics of different FPGAs are not linear, as shown in Figure 3, the cumulative characteristics of the delay chain for the clock signal input is still constant, as shown in Figure 2.
Thirdly, there is no requirement for an external component, such as a temperature sensor. The method proposed in this paper accurately measures the temperature measurement state of the delay chain by deploying dedicated thermometric registers at both ends of the delay chain. This greatly reduces the error introduced by temperature measurement.
Finally, the method proposed in this paper could offer excellent flexibility. While deploying more thermometric registers can improve the accuracy of the temperature measurement, this can also be achieved by obtaining the mean of the edge value variation. The strategy of using only two thermometric registers at both ends of a single sub-delay chain is more practical and can save resources while still achieving a good temperature correction effect. Additionally, a larger threshold can be set to reduce the frequency of correction and dead time. For applications that are not sensitive to the dead time, a smaller threshold can also be used to increase the temperature stability of the TDC.
Most FPGA devices have logic cells close to each other with very similar performances. Thus, it is expected that there are no large thermal gradients along the delay line. Therefore, the temperature correction method presented in this paper is probably applicable to other technologies/devices, although it was only validated in Cyclone 10 LP FPGA in this study.

Conclusions
In order to achieve high performance within a wide temperature range, this paper proposed a temperature correction scheme that integrated a temperature monitor and automatic correction for NUMP TDC. The effect of temperature on the delay chain had been discussed, and a method to monitor the temperature state of the delay chain by measuring changes in the edge value of the thermometric cells had been proposed. When the variation in the edge value exceeded a given threshold, temperature correction was launched. The temperature correction was performed in the FPGA in real-time.
In conclusion, this paper developed a low-cost and high-performance method that could effectively reduce the effect of temperature fluctuations on an FPGA-based NUMP TDC. Using this method, a resolution of 8.8 ps RMS over a wide temperature range from 5 • C to 80 • C had been achieved in a NUMP-TDC implemented in a Cyclone 10 LP FPGA.