A Full Parallel Event Driven Readout Technique for Area Array SPAD FLIM Image Sensors

This paper presents a full parallel event driven readout method which is implemented in an area array single-photon avalanche diode (SPAD) image sensor for high-speed fluorescence lifetime imaging microscopy (FLIM). The sensor only records and reads out effective time and position information by adopting full parallel event driven readout method, aiming at reducing the amount of data. The image sensor includes four 8 × 8 pixel arrays. In each array, four time-to-digital converters (TDCs) are used to quantize the time of photons’ arrival, and two address record modules are used to record the column and row information. In this work, Monte Carlo simulations were performed in Matlab in terms of the pile-up effect induced by the readout method. The sensor’s resolution is 16 × 16. The time resolution of TDCs is 97.6 ps and the quantization range is 100 ns. The readout frame rate is 10 Mfps, and the maximum imaging frame rate is 100 fps. The chip’s output bandwidth is 720 MHz with an average power of 15 mW. The lifetime resolvability range is 5–20 ns, and the average error of estimated fluorescence lifetimes is below 1% by employing CMM to estimate lifetimes.


Introduction
Fluorescence lifetime imaging microscopy (FLIM) is a rather new and effective tool that can be used to analyze complex biological samples, either at the microscopic or macroscopic level [1,2]. The progress of confocal microscopy improves the time resolution. The map of fluorescence lifetime allows one to discriminate different fluorophores and to acquire valuable insights into the behavior of emitting molecules, thus obtaining information like local pH and oxygen concentration in cells, etc. [3].
The two most common techniques for measuring the fluorescence lifetime are the modulated frequency-domain technique and the time-domain technique [4]. Time-domain techniques include time-correlated single-photon counting (TCSPC) and time-gated technique [5]. TCSPC allows for high accuracy in measuring lifetime and it is the most photon-efficient technique.
In commercial TCSPC systems, one detector, typically an avalanche photodiode (APD) or photomultiplier tube (PMT), with one time-to-digital converter (TDC) measurement channel is raster scanned across a sample. At each point in the image, a laser is pulsed and the arrival time of the first fluorescent photon relative to the laser pulse is measured. With repeated laser pulses, the arrival time of these individual photons is collected and the lifetime is extracted from the exponentially distributed decay curves fitted to the resulting histogram of photon arrival times. Although TCSPC has been developed for many years, several drawbacks still need to be solved. The imaging system is expensive and cumbersome. Compact, high-speed, and portable system-on-chip FLIM solutions which are robust and easy to operate are in increasing demand, especially in clinical and commercial applications [6]. The operating principle of the event record circuits is shown in Figure 2. A group of interleaved token-passing shift registers are used to distribute events to the array of TDCs. The time data of detected photons are transmitted into TDCs in turn, and consequently TDC array can record photons' arrival time successively [24]. The column and row position data is obtained by two encoders respectively. Then the position data is stored into corresponding registers, which is controlled by the output of the OR-tree. In order to guarantee the validity of data, the data in one frame is valid only when the amounts of column data, row data and time data are the same. By employing event driven readout instead of reading out all data, the data rate is reduced to 1/16 of the initial rate, which significantly mitigate the requirement for input/output (I/O) bandwidth.

SPAD Pixel Array
In Figure 3, the implementation of the in-pixel circuits is shown. An active quenching circuit based on current sensing is implemented instead of using quenching resistors. After pixels are reset, transistors M1, M2, M3 and M4 form a positive feedback, which can sense the sudden increase of cathode current of the SPAD and quench the current promptly [25]. In addition, the logic circuit controls transistor M5 to speed up the quenching process. After finishing quenching, SPAD does not go back to Geiger mode instantly, but waits for the global RESET signal to open M6. The output voltage of the quenching circuit is converted into a sub-nanosecond voltage pulse through the mono-stable circuit. The operating principle of the event record circuits is shown in Figure 2. A group of interleaved token-passing shift registers are used to distribute events to the array of TDCs. The time data of detected photons are transmitted into TDCs in turn, and consequently TDC array can record photons' arrival time successively [24]. The column and row position data is obtained by two encoders respectively. Then the position data is stored into corresponding registers, which is controlled by the output of the OR-tree. In order to guarantee the validity of data, the data in one frame is valid only when the amounts of column data, row data and time data are the same. By employing event driven readout instead of reading out all data, the data rate is reduced to 1/16 of the initial rate, which significantly mitigate the requirement for input/output (I/O) bandwidth. The operating principle of the event record circuits is shown in Figure 2. A group of interleaved token-passing shift registers are used to distribute events to the array of TDCs. The time data of detected photons are transmitted into TDCs in turn, and consequently TDC array can record photons' arrival time successively [24]. The column and row position data is obtained by two encoders respectively. Then the position data is stored into corresponding registers, which is controlled by the output of the OR-tree. In order to guarantee the validity of data, the data in one frame is valid only when the amounts of column data, row data and time data are the same. By employing event driven readout instead of reading out all data, the data rate is reduced to 1/16 of the initial rate, which significantly mitigate the requirement for input/output (I/O) bandwidth.

SPAD Pixel Array
In Figure 3, the implementation of the in-pixel circuits is shown. An active quenching circuit based on current sensing is implemented instead of using quenching resistors. After pixels are reset, transistors M1, M2, M3 and M4 form a positive feedback, which can sense the sudden increase of cathode current of the SPAD and quench the current promptly [25]. In addition, the logic circuit controls transistor M5 to speed up the quenching process. After finishing quenching, SPAD does not go back to Geiger mode instantly, but waits for the global RESET signal to open M6. The output voltage of the quenching circuit is converted into a sub-nanosecond voltage pulse through the mono-stable circuit.

SPAD Pixel Array
In Figure 3, the implementation of the in-pixel circuits is shown. An active quenching circuit based on current sensing is implemented instead of using quenching resistors. After pixels are reset, transistors M1, M2, M3 and M4 form a positive feedback, which can sense the sudden increase of cathode current of the SPAD and quench the current promptly [25]. In addition, the logic circuit controls transistor M5 to speed up the quenching process. After finishing quenching, SPAD does not go Then the pulse is sent through OR-trees of row and column to provide row and column outputs of the 8ˆ8 pixel array. The whole 8ˆ8 pixel array has 8-bit row bus and 8-bit column bus as outputs. In addition, a calibration module is embedded into pixels in order to calibrate TDC quantization error induced by signal path delay. Then the pulse is sent through OR-trees of row and column to provide row and column outputs of the 8 × 8 pixel array. The whole 8 × 8 pixel array has 8-bit row bus and 8-bit column bus as outputs. In addition, a calibration module is embedded into pixels in order to calibrate TDC quantization error induced by signal path delay.

Time-to-Digital Converters Array
The TDC array, which is similar with the structure shown in [14], has a time resolution of 97.6 ps. The structure of TDC is shown in Figure 4. In order to minimize the TDC's power consumption, the conversion is achieved in reverse START-STOP mode. The GRO begins to oscillate as soon as the START signal occurs. The global STOP signal is synchronized with the excitation laser. In addition, the oscillator can start oscillating immediately, after the node voltage of the GRO being reset. A 7-bit counter records the number of cycles of the GRO as seven most-significant-bits (MSBs) of the measurement. The three least-significant-bits (LSBs) are provided by the eight states of the GRO. The 10-bit time information is stored into Register A and Register B alternately, and the other register's data waits to be read out.   Figure 5 shows the timing diagram of the scheme. The RESET signal is used to reset TDCs and pixels. The READ1, READ2, RESET1 and RESET2 signals are used to control the two groups of registers in each TDC alternately.

Time-to-Digital Converters Array
The TDC array, which is similar with the structure shown in [14], has a time resolution of 97.6 ps. The structure of TDC is shown in Figure 4. In order to minimize the TDC's power consumption, the conversion is achieved in reverse START-STOP mode. The GRO begins to oscillate as soon as the START signal occurs. The global STOP signal is synchronized with the excitation laser. In addition, the oscillator can start oscillating immediately, after the node voltage of the GRO being reset. A 7-bit counter records the number of cycles of the GRO as seven most-significant-bits (MSBs) of the measurement. The three least-significant-bits (LSBs) are provided by the eight states of the GRO. The 10-bit time information is stored into Register A and Register B alternately, and the other register's data waits to be read out. Then the pulse is sent through OR-trees of row and column to provide row and column outputs of the 8 × 8 pixel array. The whole 8 × 8 pixel array has 8-bit row bus and 8-bit column bus as outputs. In addition, a calibration module is embedded into pixels in order to calibrate TDC quantization error induced by signal path delay.

Time-to-Digital Converters Array
The TDC array, which is similar with the structure shown in [14], has a time resolution of 97.6 ps. The structure of TDC is shown in Figure 4. In order to minimize the TDC's power consumption, the conversion is achieved in reverse START-STOP mode. The GRO begins to oscillate as soon as the START signal occurs. The global STOP signal is synchronized with the excitation laser. In addition, the oscillator can start oscillating immediately, after the node voltage of the GRO being reset. A 7-bit counter records the number of cycles of the GRO as seven most-significant-bits (MSBs) of the measurement. The three least-significant-bits (LSBs) are provided by the eight states of the GRO. The 10-bit time information is stored into Register A and Register B alternately, and the other register's data waits to be read out.   Figure 5 shows the timing diagram of the scheme. The RESET signal is used to reset TDCs and pixels. The READ1, READ2, RESET1 and RESET2 signals are used to control the two groups of registers in each TDC alternately.  Figure 5 shows the timing diagram of the scheme. The RESET signal is used to reset TDCs and pixels. The READ1, READ2, RESET1 and RESET2 signals are used to control the two groups of registers in each TDC alternately.

Pile-Up Effect during Events Readout
The readout circuits of this chip lead to the result that not every pulse can be detected by TDC arrays or position recording circuits. The pile-up effect of TCSPC is classified into three types. The first one is the traditional pile-up effect and it can be alleviated by reducing photon-rate to 1%, and the photon-rate is directly proportional to the emission intensity. The second one is caused by the dead time of SPAD devices. Nonetheless, thanks to the global reset of the active quenching circuits, this part can be neglected. The last one is the dead time of the processing circuits of the chip, i.e., the time interval between the photon's arrival and being detected. The chip proposed in this paper is influenced mostly by the third kind of pile-up effect. Pulses from each pixel are shortened by the mono-stable circuit but still have a finite length tp. For the 8 × 8 pixel array, if the time interval of any two photon events is less than tp apart, two pulses will merge together at the output of the OR-tree. Consequently, only the first event will get processed further and the second event is missed completely. Figure 6 shows the process of an 8 × 8 pixel array detecting photons. It can be seen that the electrons labelled as 2, 3 and 7 will not be detected due to the dead time of the OR-tree. Thus, the photons' arrival time histogram is modulated by the readout circuits. Another influencing factor to be considered in the detecting behavior is the interactions among pixels within the same sub-array and it is different from the common cross-talk phenomenon in device level. It derives from the variation of the connecting probability towards TDCs. The probability is modulated by the neighboring pixels. If the sensor is supposed to work as a Mini-Silicon Photomultiplier (MSP), the lifetimes that each pixel needs to measure are the same; therefore, the interactions among the sub-array are also the same. Hence, the influence of interactions can be neglected. However, if the sensor is supposed to work as an area array detector, each pixel may need to detect different lifetimes. Short lifetime means a rapid decay and most photons are expected to be detected in a short period after excitation. Long lifetime is in the opposite condition. This results in a larger probability that the TDC channels are occupied by pixels that detect short lifetimes. The

Pile-Up Effect during Events Readout
The readout circuits of this chip lead to the result that not every pulse can be detected by TDC arrays or position recording circuits. The pile-up effect of TCSPC is classified into three types. The first one is the traditional pile-up effect and it can be alleviated by reducing photon-rate to 1%, and the photon-rate is directly proportional to the emission intensity. The second one is caused by the dead time of SPAD devices. Nonetheless, thanks to the global reset of the active quenching circuits, this part can be neglected. The last one is the dead time of the processing circuits of the chip, i.e., the time interval between the photon's arrival and being detected. The chip proposed in this paper is influenced mostly by the third kind of pile-up effect. Pulses from each pixel are shortened by the mono-stable circuit but still have a finite length t p . For the 8ˆ8 pixel array, if the time interval of any two photon events is less than t p apart, two pulses will merge together at the output of the OR-tree. Consequently, only the first event will get processed further and the second event is missed completely. Figure 6 shows the process of an 8ˆ8 pixel array detecting photons. It can be seen that the electrons labelled as 2, 3 and 7 will not be detected due to the dead time of the OR-tree. Thus, the photons' arrival time histogram is modulated by the readout circuits.

Pile-Up Effect during Events Readout
The readout circuits of this chip lead to the result that not every pulse can be detected by TDC arrays or position recording circuits. The pile-up effect of TCSPC is classified into three types. The first one is the traditional pile-up effect and it can be alleviated by reducing photon-rate to 1%, and the photon-rate is directly proportional to the emission intensity. The second one is caused by the dead time of SPAD devices. Nonetheless, thanks to the global reset of the active quenching circuits, this part can be neglected. The last one is the dead time of the processing circuits of the chip, i.e., the time interval between the photon's arrival and being detected. The chip proposed in this paper is influenced mostly by the third kind of pile-up effect. Pulses from each pixel are shortened by the mono-stable circuit but still have a finite length tp. For the 8 × 8 pixel array, if the time interval of any two photon events is less than tp apart, two pulses will merge together at the output of the OR-tree. Consequently, only the first event will get processed further and the second event is missed completely. Figure 6 shows the process of an 8 × 8 pixel array detecting photons. It can be seen that the electrons labelled as 2, 3 and 7 will not be detected due to the dead time of the OR-tree. Thus, the photons' arrival time histogram is modulated by the readout circuits. Another influencing factor to be considered in the detecting behavior is the interactions among pixels within the same sub-array and it is different from the common cross-talk phenomenon in device level. It derives from the variation of the connecting probability towards TDCs. The probability is modulated by the neighboring pixels. If the sensor is supposed to work as a Mini-Silicon Photomultiplier (MSP), the lifetimes that each pixel needs to measure are the same; therefore, the interactions among the sub-array are also the same. Hence, the influence of interactions can be neglected. However, if the sensor is supposed to work as an area array detector, each pixel may need to detect different lifetimes. Short lifetime means a rapid decay and most photons are expected to be detected in a short period after excitation. Long lifetime is in the opposite condition. This results in a larger probability that the TDC channels are occupied by pixels that detect short lifetimes. The Another influencing factor to be considered in the detecting behavior is the interactions among pixels within the same sub-array and it is different from the common cross-talk phenomenon in device level. It derives from the variation of the connecting probability towards TDCs. The probability is modulated by the neighboring pixels. If the sensor is supposed to work as a Mini-Silicon Photomultiplier (MSP), the lifetimes that each pixel needs to measure are the same; therefore, the interactions among the sub-array are also the same. Hence, the influence of interactions can be neglected. However, if the sensor is supposed to work as an area array detector, each pixel may need to detect different lifetimes. Short lifetime means a rapid decay and most photons are expected to be detected in a short period after excitation. Long lifetime is in the opposite condition. This results in a larger probability that the TDC channels are occupied by pixels that detect short lifetimes. The pile-up effect is thus deteriorated. Furthermore, the modulated influence along the whole detecting period is not the same. In order to simplify the analysis, the fluorescence is assumed to have a single-exponential decay. The influence is analyzed and simulated in Maximum Likelihood Estimator (MLE), Center-of-Mass Method (CMM) and Least-Squares Method (LSM). When the sensor works as a MSP, the average number of photons that one pixel can detect is [26]: where τ is the lifetime, N 2 is the number of pixels and µ is the expected number of photons the pixel array detects. When the sensor works as an area array detector, we only analyze one 8ˆ8 pixel array because each 8ˆ8 pixel array is independent. Assuming that τ i,j is the lifetime which pixel (i, j) needs to detect, and then the probability of photon detection on pixel (i, j) is: where P0 i,j is the photon-rate of pixel (i, j). Then the probability failing to detect photon on pixel (i, j) is: If the interaction of other pixels is neglected, then for a certain pixel, which is assumed as pixel (1, 1), the probability of photon detection along with time is: To simulate the influence of pile-up effect, the Monte Carlo simulation of the operation process of the chip is done in MATLAB. The MATLAB random number generator is used to simulate individual photon events with the appropriate statistical distribution. For a single-exponential decay, the probability that the pile-up occurs can be analyzed. Figure 7a,b is the simulation results of the counts error of events in different measuring windows on conditions that the pixel working under different τ MSP /t p and τ ARRAY /τ 1,1 , respectively, where τ MSP is the lifetime being measured by the pixel array when it is used as a MSP, τ 1,1 is the lifetime being measured by the pixel (1, 1) when the SPAD array is used as an array sensor, and τ ARRAY is the lifetime being measured by other pixels. The dashed line is the simulated data, while the solid line is the theoretical curve. From the two figures, it can be seen that the simulation results keep in accordance with the theory anticipation. During the period close to the excitation, it is more probable for the pixel array to detect photons, so the pile-up effect is more apparent and the counts error increases. After a while, the probability of photon detection falls, so the distribution gradually approaches to single-exponential curve. In PMT working mode, as τ MSP /t p goes smaller, the pile-up effect gets degraded. But when the sensor works as an array imager, the influence is modulated along time. When τ ARRAY /τ 1,1 = 0.1, pixel (1,1) detects longer lifetimes than other pixels do. Then at the beginning of the detection, pile-up effect from pixel (1,1) is less and the counts error is comparatively small. However, in later detection time, pile-up effect from pixel (1,1) becomes strong and the counts error is comparatively large. In the simulation process of the chip circuits, t p is found to be 360 ps.

Fluorescence Lifetime Imaging Algorithm
In this section, the influence of pile-up effect to MLE, CMM and LSM is simulated and analyzed. For a fluorescence histogram with a single-exponential decay f(t) = Aexp(−t/τ) in a measurement window 0 ≤ t ≤ T recorded by the 10-bit TDC (M = 1024), the lifetime estimated using MLE, τMLE, can be obtained by [27]: where Nc is the total signal counts within the measurement window, h is the LSB of TDC, Nj is the number of recorded counts in the jth time bin (j = 1, 2, ... , M).
The CMM can be viewed as a hardware implementation algorithm of the MLE although their physical definitions are not the same. The CMM is easy to be implemented by FPGA [28], and then it is possible to achieve video-rate fluorescence lifetime imaging by employing CMM to estimate fluorescence lifetimes. The lifetime estimated using CMM, τCMM, can be obtained by: The LSM minimizes the chi-square, ∑{(o − e) 2 /e}, where o is the statistics value and e is the expected value. The lifetime estimated using LSM, τLSM, can be obtained by: Figure 8 illustrates the impact of pile-up effect on estimating lifetime using MLE, CMM and LSM respectively, when the pixel array measures uniform lifetime. The theoretical results marked as solid lines are compared to Monte Carlo simulations marked with asterisks (scattered points). It can be seen that there is not much difference between MLE and CMM under condition that the lifetime being measured is short. But when the lifetime being measured gets longer, it cannot be guaranteed that all photons triggered by the laser can be detected by the sensor due to the limited detection time window. As a result, the measuring error of CMM increases. From Figure 8, it also can be seen that tp is the dominant factor that influences the measurement accuracy.

Fluorescence Lifetime Imaging Algorithm
In this section, the influence of pile-up effect to MLE, CMM and LSM is simulated and analyzed. For a fluorescence histogram with a single-exponential decay f (t) = Aexp(´t/τ) in a measurement window 0 ď t ď T recorded by the 10-bit TDC (M = 1024), the lifetime estimated using MLE, τ MLE , can be obtained by [27]: where N c is the total signal counts within the measurement window, h is the LSB of TDC, N j is the number of recorded counts in the jth time bin (j = 1, 2, ... , M).
The CMM can be viewed as a hardware implementation algorithm of the MLE although their physical definitions are not the same. The CMM is easy to be implemented by FPGA [28], and then it is possible to achieve video-rate fluorescence lifetime imaging by employing CMM to estimate fluorescence lifetimes. The lifetime estimated using CMM, τ CMM , can be obtained by: The LSM minimizes the chi-square, ř {(o´e) 2 /e}, where o is the statistics value and e is the expected value. The lifetime estimated using LSM, τ LSM , can be obtained by: Figure 8 illustrates the impact of pile-up effect on estimating lifetime using MLE, CMM and LSM respectively, when the pixel array measures uniform lifetime. The theoretical results marked as solid lines are compared to Monte Carlo simulations marked with asterisks (scattered points). It can be seen that there is not much difference between MLE and CMM under condition that the lifetime being measured is short. But when the lifetime being measured gets longer, it cannot be guaranteed that all photons triggered by the laser can be detected by the sensor due to the limited detection time window. As a result, the measuring error of CMM increases. From Figure 8, it also can be seen that t p is the dominant factor that influences the measurement accuracy.  Figure 9 shows the interaction effect of pixels under condition that the chip works as an area array image sensor. In this figure, it is assumed that τ1,1 is 1 ns, 5 ns, 10 ns and 15 ns respectively. Also, the measuring error is relative to τ1,1. When τARRAY is short, the pile-up effect becomes degraded and the deviation is serious. Considering the situation that τ1,1 is 5 ns and τARRAY is 1 ns, the deviation reaches 9% by employing LSM. To maintain the average estimated error below 1% in CMM algorithm, the detecting range of the pixels are 5-20 ns. MLE maintains its accuracy below 1% in full range. But MLE requires massive calculations and is not practical through hardware.
The hold time of encoders also influences the accuracy of measurements. When the time interval of two events is shorter than the hold time of encoders, the encoders used to record the column and row data of events cannot get the correct position code of the first event. Once that occurs, the output code of encoder is "1000", the data of this time of fluorescence trigger is abandoned.
The resolvability range of this TCSPC design depends on the error from the estimation algorithm and the influencing factors from detecting operation. CMM typically has an ideal resolvability range of T/4 to T/100 with post software calibrations where T is the period of the laser pulse. The origination of this algorithm error is Poisson noise [28]. The influencing factors, such as the pulse width tp output from mono-stable circuit and the interaction effect among pixels, also shorten the detecting range. In this design, after an overall consideration, the resolvability range is approximately 5-20 ns under 10 MHz laser excitation rate.
The detecting range in this design is suitable in applications of long decay lifetimes. However, in actual biomedical applications such as Indocyanine Green (ICG), the resolvability range of sub-nanosecond is of great importance. The design should be optimized to satisfy the extending range. First, the quantization range of TDC should be enough as the reverse arrival time of photons becomes longer under rapid decays. Then, the module delays also need to be minimized to accelerate the response. Additionally, the number of TDCs shared among the same sub-array should be enlarged to alleviate pile-up effects.   Figure 9 shows the interaction effect of pixels under condition that the chip works as an area array image sensor. In this figure, it is assumed that τ 1,1 is 1 ns, 5 ns, 10 ns and 15 ns respectively. Also, the measuring error is relative to τ 1,1 . When τ ARRAY is short, the pile-up effect becomes degraded and the deviation is serious. Considering the situation that τ 1,1 is 5 ns and τ ARRAY is 1 ns, the deviation reaches 9% by employing LSM. To maintain the average estimated error below 1% in CMM algorithm, the detecting range of the pixels are 5-20 ns. MLE maintains its accuracy below 1% in full range. But MLE requires massive calculations and is not practical through hardware.
The hold time of encoders also influences the accuracy of measurements. When the time interval of two events is shorter than the hold time of encoders, the encoders used to record the column and row data of events cannot get the correct position code of the first event. Once that occurs, the output code of encoder is "1000", the data of this time of fluorescence trigger is abandoned.
The resolvability range of this TCSPC design depends on the error from the estimation algorithm and the influencing factors from detecting operation. CMM typically has an ideal resolvability range of T/4 to T/100 with post software calibrations where T is the period of the laser pulse. The origination of this algorithm error is Poisson noise [28]. The influencing factors, such as the pulse width t p output from mono-stable circuit and the interaction effect among pixels, also shorten the detecting range. In this design, after an overall consideration, the resolvability range is approximately 5-20 ns under 10 MHz laser excitation rate.
The detecting range in this design is suitable in applications of long decay lifetimes. However, in actual biomedical applications such as Indocyanine Green (ICG), the resolvability range of sub-nanosecond is of great importance. The design should be optimized to satisfy the extending range. First, the quantization range of TDC should be enough as the reverse arrival time of photons becomes longer under rapid decays. Then, the module delays also need to be minimized to accelerate the response. Additionally, the number of TDCs shared among the same sub-array should be enlarged to alleviate pile-up effects.
in actual biomedical applications such as Indocyanine Green (ICG), the resolvability range of sub-nanosecond is of great importance. The design should be optimized to satisfy the extending range. First, the quantization range of TDC should be enough as the reverse arrival time of photons becomes longer under rapid decays. Then, the module delays also need to be minimized to accelerate the response. Additionally, the number of TDCs shared among the same sub-array should be enlarged to alleviate pile-up effects.  In actual FLIM measurements, the existence of background and DCR will restrict the accuracy of measurement. As background and DCR are rather lower than the signal intensity, as seen in Figure 10, so the influence to the pile up effect can be ignored. Background and DCR are taken into account by the method mentioned in [28]. In actual FLIM measurements, the existence of background and DCR will restrict the accuracy of measurement. As background and DCR are rather lower than the signal intensity, as seen in Figure 10, so the influence to the pile up effect can be ignored. Background and DCR are taken into account by the method mentioned in [28].

Simulation Results of Circuits
The circuit design is based on the 0.13 μm 1P3M CIS process and SPADs are substituted by a Verilog A model. The Verilog A model has the advantage of directly generating random events obeying the exponential distribution. Synchronized to 10 MHz laser excitations, the proposed structure of a 16 × 16 array is simulated to detect various lifetimes with 1% detecting probability. The resolution of two column-level TDCs is 97 ps. We should take note that the resolution of the SPAD array is not constrained below 16 × 16. The SPAD array can be extended by placing more 8 × 8 sub-arrays with the same TDC/SPAD ratio. However, too large array may be problematic in layout routing. The design of 16 × 16 SPAD array in this work is to testify the proposed readout method in fast imaging mode.
The pixel is influenced by process, voltage and temperature variations. Figure 11 is the process corner simulation result of the quenching circuit, where tdelay is the time delay between the output pulse of pixels and the photon's arrival time and twidth is the width of the output pulse. The process corner covers 'ss (NMOS: Slow and PMOS: Slow)' to 'ff (NMOS: Fast and PMOS: Fast)' and the temperature is swept from −40 °C to 80 °C. The simulation results show that twidth varied from 90 ps to 170 ps, and the tdelay varied from 400 ps to 650 ps. The deviation of the process corner contributes to the estimation error of lifetimes.

Simulation Results of Circuits
The circuit design is based on the 0.13 µm 1P3M CIS process and SPADs are substituted by a Verilog A model. The Verilog A model has the advantage of directly generating random events obeying the exponential distribution. Synchronized to 10 MHz laser excitations, the proposed structure of a 16ˆ16 array is simulated to detect various lifetimes with 1% detecting probability. The resolution of two column-level TDCs is 97 ps. We should take note that the resolution of the SPAD array is not constrained below 16ˆ16. The SPAD array can be extended by placing more 8ˆ8 sub-arrays with the same TDC/SPAD ratio. However, too large array may be problematic in layout routing. The design of 16ˆ16 SPAD array in this work is to testify the proposed readout method in fast imaging mode.
The pixel is influenced by process, voltage and temperature variations. Figure 11 is the process corner simulation result of the quenching circuit, where t delay is the time delay between the output pulse of pixels and the photon's arrival time and t width is the width of the output pulse. The process corner covers 'ss (NMOS: Slow and PMOS: Slow)' to 'ff (NMOS: Fast and PMOS: Fast)' and the temperature is swept from´40˝C to 80˝C. The simulation results show that t width varied from 90 ps to 170 ps, and the t delay varied from 400 ps to 650 ps. The deviation of the process corner contributes to the estimation error of lifetimes. corner simulation result of the quenching circuit, where tdelay is the time delay between the output pulse of pixels and the photon's arrival time and twidth is the width of the output pulse. The process corner covers 'ss (NMOS: Slow and PMOS: Slow)' to 'ff (NMOS: Fast and PMOS: Fast)' and the temperature is swept from −40 °C to 80 °C. The simulation results show that twidth varied from 90 ps to 170 ps, and the tdelay varied from 400 ps to 650 ps. The deviation of the process corner contributes to the estimation error of lifetimes.
(a) (b) Figure 11. The process corner simulation result of the quenching circuit: (a) twidth; (b) tdelay. Figure 11. The process corner simulation result of the quenching circuit: (a) t width ; (b) t delay .
The non-linearity of the proposed TDC is analyzed with an input of ramp signal. Figure 12 is the linearity result of the proposed TDC. As the actual time range that the TDC needs to quantify is 0-95 ns, the digital output of the proposed TDC is limited to 950. The differential nonlinearity (DNL) is´0.082LSB\0.102LSB, and the integral nonlinearity (INL) is´0.205\0.282 LSB.
During the FLIM simulation, it is assumed that the fluorescence intensity of the whole picture is uniform, which means that the probability of every pixel to detect photons is 1%. This leads to the fact that the simulation result is worse than the actual measured result because the impact of pile-up is maximized. The maximum imaging frame rate is 100 fps, in the situation that a lifetime map can be obtained by handling the information of about 1000 photons each pixel. If the accurate imaging is needed, the imaging frame rate can be decreased to reduce the standard deviation of estimated lifetimes. The non-linearity of the proposed TDC is analyzed with an input of ramp signal. Figure 12 is the linearity result of the proposed TDC. As the actual time range that the TDC needs to quantify is 0-95 ns, the digital output of the proposed TDC is limited to 950. The differential nonlinearity (DNL) is −0.082LSB\0.102LSB, and the integral nonlinearity (INL) is −0.205\0.282 LSB.
During the FLIM simulation, it is assumed that the fluorescence intensity of the whole picture is uniform, which means that the probability of every pixel to detect photons is 1%. This leads to the fact that the simulation result is worse than the actual measured result because the impact of pile-up is maximized. The maximum imaging frame rate is 100 fps, in the situation that a lifetime map can be obtained by handling the information of about 1000 photons each pixel. If the accurate imaging is needed, the imaging frame rate can be decreased to reduce the standard deviation of estimated lifetimes. The No. 1 picture (the initial lifetime map in Figure 13) used as fluorescence source has a fluorophore whose lifetime is 14 ns, and the background lifetime is 4 ns. The total exposure time is 10 ms, 100 ms and 2 s, respectively, and the output data of the circuits is handled by MLE, CMM and LSM, respectively. The simulation results are shown in Figure 13. It can be seen that the outline of the fluorophore is obvious when the total exposure time is 10 ms, which means that the imaging frame rate can be 100 fps.
The No. 2 picture (the initial lifetime map in Figure 14) used as fluorescence source has a fluorophore whose lifetime is 10.5 ns. The simulation results are shown in Figure 14. The outline of fluorophores becomes difficult to distinguish when the imaging frame rate approaches 100 fps, the variance of fluorescence lifetime imaging results no longer stands negligible. In the circumstance of 10 fps, namely 100 ms exposure time, the fluorophore's profile can be distinguished. Table 1   The No. 1 picture (the initial lifetime map in Figure 13) used as fluorescence source has a fluorophore whose lifetime is 14 ns, and the background lifetime is 4 ns. The total exposure time is 10 ms, 100 ms and 2 s, respectively, and the output data of the circuits is handled by MLE, CMM and LSM, respectively. The simulation results are shown in Figure 13. It can be seen that the outline of the fluorophore is obvious when the total exposure time is 10 ms, which means that the imaging frame rate can be 100 fps.
The No. 2 picture (the initial lifetime map in Figure 14) used as fluorescence source has a fluorophore whose lifetime is 10.5 ns. The simulation results are shown in Figure 14. The outline of fluorophores becomes difficult to distinguish when the imaging frame rate approaches 100 fps, the variance of fluorescence lifetime imaging results no longer stands negligible. In the circumstance of 10 fps, namely 100 ms exposure time, the fluorophore's profile can be distinguished. Table 1 shows the mean values, variances and FoMs of detecting results of pixel arrays with 10.5 ns lifetime. The Figure of Merit (FoM) of fluorescence lifetime imaging can be expressed as follows: where σ τ is the standard deviation of estimated lifetimes, and ∆τ is the average offset of estimated lifetimes.  Table 2 summarizes the detailed simulation results of the estimated fluorophore's lifetime. The volatility of FoM is attributed to the lack of sample volume. Since the simulation of circuits takes a lot of time, only 20 M cycles are simulated. It can be concluded that the FoM of LSM is rather smaller than MLE and CMM. As the fluorescence lifetime falls in between 5 ns and 14 ns, CMM is close to MLE in terms of FoM, but when it comes to 14 ns or longer, CMM's FoM presents apparent decrease.  Table 2 summarizes the detailed simulation results of the estimated fluorophore's lifetime. The volatility of FoM is attributed to the lack of sample volume. Since the simulation of circuits takes a lot of time, only 20 M cycles are simulated. It can be concluded that the FoM of LSM is rather smaller than MLE and CMM. As the fluorescence lifetime falls in between 5 ns and 14 ns, CMM is close to MLE in terms of FoM, but when it comes to 14 ns or longer, CMM's FoM presents apparent decrease.    The simulation results above indicate that moderate measurement accuracy is attainable using full parallel event driven readout method. The CMM and MLE algorithm can be put in use for estimating fluorescence lifetimes. MLE is capable of extending the lifetime resolvability range with a high performance computer. But the CMM shows its advantage in the easy implementation of hardware. The simulation results above indicate that moderate measurement accuracy is attainable using full parallel event driven readout method. The CMM and MLE algorithm can be put in use for estimating fluorescence lifetimes. MLE is capable of extending the lifetime resolvability range with a high performance computer. But the CMM shows its advantage in the easy implementation of hardware. Table 3 summarizes the performance of the chip and shows the comparisons with other works. A slow repetition rate excitation laser is used in order to extend the lifetime resolvability range. The low power dissipation is attributed to the reduced number of TDCs. The chip's maximum readout sample rate is 10 MSps, which is faster than that reported in all except one of the previously published articles that are referenced in Table 3. The frame rate is defined as the lifetime generation rate. The lifetime is estimated from the histograms consisting of large number of photons. The maximum frame rate in this work is 100 fps. The less number of collected photons means a lower signal-to-noise ratio compared with previous work in Reference [14] or [18]. Meanwhile, the output bandwidth is reduced to 720 Mbps. The Figure of Merit (FoM) of the proposed chip can be expressed as follows: FoM " Power Resolution¨Readout_Sample_Rate (9) The FoM of the proposed structure is 5.86 pJ/(sample¨pixel). The extremely low chip FoM is attributed to the implementation of full parallel event-driven readout, the small resolution and the reduction of measuring accuracy.

Conclusions
In this work, we have proposed a full parallel event driven readout method which is implemented in the area array SPAD image sensor for high-speed FLIM. The maximum imaging frame rate is 100 fps, and the number of TDCs used declined to 16. The average power consumption is 15 mW. The output bandwidth is reduced to 720 Mbps. We did the analysis of imaging error caused by pile-up effect which is induced by the readout method. The lifetime resolvability range is 5-20 ns, and the average error of estimated fluorescence lifetimes is below 1%. The proposed readout method is suitable in video-rate FLIM with the resolvability of long decay lifetimes.