One of the leading arguments to move to 3D integration is to independently choose the best process for both the SPAD array layer and the CMOS readout electronics layer. This section presents key design choices that must be made to obtain an optimized 3D PDC for radiation instrumentation concerning the SPAD array, the CMOS technology, the quenching circuit, the time-to-digital converter, the digital signal processing and the system level integration.
6.1. SPAD Array
The SPAD array is either designed in an FSI or BSI configuration. In both cases, the implementation requires freedom of design to optimize the SPAD features (junction profile, guard ring, shape and dimensions, trenches, passivation layer, etc.). This optimization can benefit from the advancements made for analog SiPMs [101
]. In both configurations, the doping profiles are designed so that the photoelectrons initiate the avalanche, since in silicon the ionization coefficient is greater for electrons than holes [33
]. Figure 10
illustrates typical BSI and FSI configurations. FSI refers to the SPAD junction being located on the side of the incident light similar to planar SPADs, such as in the MPD SPAD [102
] or as in a typical 2D CMOS SPAD [57
]. This configuration requires 3D processing to electrically connect each SPAD to the CMOS readout circuit (thinning, TSVs, backside interconnects, bonding, etc.) [72
]. BSI refers to the junction being located on the opposite side of the incident light, near the CMOS layer. This configuration requires a bonding process to electrically connect each SPADs to the CMOS [91
]. The sensor layer must be thinned down (tens of microns or lower) to detect photons from the visible range, similar to a BSI CCD [106
In the BSI scheme, the absorption/drift region is distinct from the multiplication/avala nche region. The sensor cross-section is similar to a reach-through SPAD, where photocarriers must drift toward the junction before triggering an avalanche [33
]. This contrasts with the FSI scheme, where those regions overlay each other. This imposes that the BSI design be fully depleted with a low-p-type or intrinsic absorption region [80
]. In the FSI case, the structure is narrowly depleted and the profile type (p+
n or n+
p) must be chosen according to the wavelength of interest, depending on whether most photons are absorbed above the depletion (p+
n ) or below the junction (n+
p). For example, considering a junction laying at 100 nm below the surface. Detection of photons with absorption length shorter than 75 nm (
nm) favors a p+
n junction (more than 3/4 of photons are absorbed before the junction) whereas photons with absorption length greater than 350 nm (
nm) favors an n+
p structure (more than 3/4 of photons absorbed below the junction). The FSI’s non-depleted region induces a series resistance between the SPAD and the CMOS electronics that must be below a few
The main driver to use the BSI configuration is to maximize light absorption, as shown by the PDE comparison between planar and reach-through SPADs [109
]. As in BSI CCDs, the BSI SPAD fill-factor tends toward unity because the illuminated side is not impaired by the presence of contact electrodes or other components such as those found in FSI devices (isolation trench, TSV, etc.). The BSI configuration is mainly used for near-infrared applications [70
], where photons are absorbed over tens of microns. A recent work shows that some blue-enhanced/NUV architectures are also possible for the BSI configuration [110
]. The FSI approach works mainly for the blue and UV range of the spectrum, since the penetration depth of these photons is below one micron (Figure 11
) and therefore, inside the depleted region. Some red-enhancement architectures are also possible for the FSI configuration [102
Deep UV (<300 nm) light detection is needed in some applications, such as in low-background physics experiments in liquid argon and liquid xenon [12
]. Photons below 360 nm penetrate less than 10 nm below the silicon surface (Figure 11
). To maximize the PDE of deep UV photons, the energy band profile must be engineered to prevent photoelectrons from getting trapped and from recombining at the surface without reaching the multiplication region. Band profile re-engineering for UV sensitivity is a well-known process for BSI CCDs [106
]. At the beginning of the 90s, the Jet Propulsion Laboratory (JPL) pioneered the delta-doping technique, a method using low temperature molecular beam epitaxy (MBE) on BSI CCDs to improve their quantum efficiency [114
]. The principle is illustrated in Figure 12
. For a p+
n diode, a few atomic layers of highly boron-doped silicon are epitaxially grown on the top of the silicon to create a peak in the band profile, from 2 nm to 4 nm below the surface. This increases the probability of collecting photoelectrons that have been generated below the barrier through enhanced charge drifting. Although photoelectrons created above the barrier can still be trapped and lost, it nevertheless has the advantage that any undesired electrons thermally excited at the surface will not make it to the avalanche region, thus blocking that source of noise.
The growth of a highly doped passivation layer by MBE has also been demonstrated on BSI SPADs [117
]. The deposition of a thin anti-reflection coating (ARC), optimized for near-UV light, provides a protection for the MBE layer and further increases the PDE of the detector above 70% at 405
. Applying this technique on an FSI 3D PDC is challenging because of the presence of surface topology (passivation layer, metal routing, etc.), as opposed to the flat backside surface in BSI detectors. To access this, one could consider performing the MBE during the sensor processing instead of the post-processing approach. Furthermore, ARCs are also needed to optimize the SPAD PDE [118
]. The transmission, reflection and absorption of the passivation layer covering the SPADs’ photosensitive area depend on the wavelength and angle of incidence of the incoming photons. ARC is challenging in VUV because most materials have strong absorption, and the layers must therefore be very thin and critically controlled [121
The main driver to use the FSI configuration is to optimize the timing resolution of the 3D PDC. The BSI absorption layer limits the SPTR of the SPAD since photoelectrons suffer from propagation time variations. This effect contributes to the timing jitter and is observed in reach-through SPADs [54
]. For instance, the low-noise SLiK SPAD (i.e., a thick BSI SPAD) has a ~30
thick reach-through region and exhibits an SPTR of >200 ps and a PDE of 65% at 650 nm [108
]. In contrast, a BSI SPAD made in TSMC 45 nm and thinned down to less than 3
has an SPTR of ~100 ps and a PDE of 25–30% at 637 nm [92
]. This illustrates the trade-off between the PDE and the SPTR of the SPAD in relation to the thickness of the absorption region for a given wavelength of interest. The FSI design also suffers from this trade-off, but to a lesser extent since the absorption layer overlays the multiplication region. This allows extremely low SPTR (<20 ps FWHM), as first demonstrated with a planar SPAD in a custom process [124
]. FSI SPADs made in standard CMOS technologies can also reach ultimate timing resolution (<10 ps FWHM), at the cost of high-noise and low PDE [125
In terms of noise, in both BSI and FSI cases, the junction itself and the resulting electric field must be properly engineered to minimize DCR caused by band-to-band tunneling (BTBT) and field-enhanced Shockley-Read-Hall (SRH) generation [101
]. In addition, all damage and defects in the active region must be minimized to avoid noise generation centers [103
]. This is especially true for the BSI configuration, since the depleted volume is larger than in the FSI configuration [101
]. DCR increases when the depleted volume is made larger [126
]. The correlated noise (afterpulsing and crosstalk) is also a nuisance for both BSI and FSI configurations and is strongly dependent on the implementation (optical isolation trench, charge per avalanche, etc.). For instance, a BSI SPAD design was introduced in the mid-2000s to reduce correlated noise while maintaining a large fill-factor [127
]. This design keeps the absorption layer unchanged and reduces the junction’s diameter to minimize the charge per avalanche and hence the afterpulsing and crosstalk noise. Electric field engineering is needed to funnel the photocarriers into the point-like multiplication region.
In terms of light detection, both BSI and FSI are suitable for radiation detection applications (visible, UV and VUV range). Because of the better achievable fill-factor, the BSI design has a slight edge over the FSI for rare signal/low-background applications such as experiments in noble liquids [12
]. Because of the better achievable SPTR, the FSI design is relevant for precise timing experiments and ToF applications such as ToF computed tomography [128
] and ToF-PET [129
]. Achieving key characteristics (high PDE, optimal SPTR and low noise) brings design trade-offs and the different applications in radiation instrumentation require the pursuit of both BSI and FSI 3D PDC implementations.
6.2. 3D Vertical Integration
One of the leading arguments to rely on 3D vertical integration is the possibility to choose the best process to design and fabricate the SPAD array and to independently choose the most appropriate CMOS process for the readout electronics. Optimal technologies will bring optimal performances of both SPADs (PDE, noise, SPTR) and CMOS electronics. However, more than this obvious argument, the 3D integration allows for higher fill-factor, higher degree of integration into systems (e.g., large-scale photodetectors) and for a less capacitive and more uniform connection to the SPAD to reach ultimate performance in timing resolution and/or power dissipation.
Although the SPAD technology is selected as described above, the CMOS process is chosen based on the complexity of the desired readout electronics per SPAD, the compatibility with 3D integration, the availability of MPW to prototype the CMOS prior to purchasing wafers for 3D assembly and the cost. Advanced 3D integration processes usually require the handling of full wafers (as opposed to die-to-die assembly techniques). Purchasing complete CMOS wafers (as opposed to sharing in multi-project runs) is expensive and limits R&D to only the big players of the industry, large institutions such as national laboratories or to large-scale physics experiments. There are ample methods to perform 3D integration (molecular bonding, microbump bonding, direct bond interconnect, etc.) [130
]. To give a good measure of the level of integration that they allow, recent progress in microbump bonding can accommodate 16
bumps with 35
spacing, which is reasonable for a SPAD pitch varying between 50
A game changer for the development of 3D PDCs would be to identify a 3D assembly process that could be done at chip level (i.e., not requiring complete wafers) to cut cost. In a similar line of thought, one could use a design of SPAD array (mainly a generic geometry with ARC and PDE optimized for a specific application) and mate it with any CMOS readout electronics with the proper pitch and application specific functionalities. Such an approach would allow researchers interested in 3D PDCs to cooperate, identify a CMOS technology and a 3D bonding process, and develop the SPAD array with an industrial collaborator. This could lead to cost sharing a multi-project 3D PDC process run in which each team would have their custom readout circuits for specific applications.
6.4. Quenching Circuit
The QC is composed of three main parts: the quenching and recharge branch, the sensing and discriminating circuit and finally the monostable circuits to set the hold-off time and recharge time for afterpulsing mitigation. The literature on QC architecture (in 2D) is quite extensive and readers are referred to these two articles as a starting point [22
For the quenching and recharge branch, multiple architectures are present in the literature: traditional passive, active or mixed (passive quenching–active recharge, variable load quenching, etc.) [22
]. The main challenges for the quenching and recharge branch are to allow for a maximum SPAD excess voltage for the PDE while minimize the capacitance on the node for the timing and noise. The quenching circuit will need to swing the SPAD electrode from the excess bias voltage to below breakdown. The voltage breakdown of the MOSFET’s drain junction to the CMOS substrate limits the voltage excursion and care must be taken when choosing the technology. One option is to use a CMOS process with HV devices, allowing for a greater voltage swing, but at the cost of larger area CMOS HV devices, limiting the available real estate for the other transistors of the quenching circuit. In recent years, new architectures have been proposed to allow a greater excess voltage through different implementation of cascode transistors [48
]. For example, one architecture allows for an excess voltage as high as 3 times the maximum voltage of the CMOS technology [136
]. Please note that the impact of these cascode-based architectures on the timing jitter of the SPAD-QC pair has not yet been demonstrated.
To reduce the timing jitter, the capacitance at the node of the detector should be minimized to increase the slope of the signal. A mixed architecture such as passive quenching–active recharge or a variable load quenching allows for minimization of the capacitance directly connected to the SPAD, as compared to a full active quenching circuit, by reducing the number of required transistors at the reading node [125
]. On the other hand, to reduce correlated noise and power consumption, both minimizing the capacitance and stopping the avalanche before the complete discharge of the SPAD limits the number of carriers involved [67
]. To this end, the sensing and discrimination node must sense the avalanche to provide a positive feedback to the quenching branch to quench the avalanche as soon as possible. As the afterpulsing and optical crosstalk are functions of the number of charges involved, the QC should swiftly detect the avalanche and stop it, an advantage of active quenching circuits over passive implementations.
For the sensing and discriminating circuit, the QC acts like a classic leading-edge discriminator that can be implemented with a simple inverter. An inverter has the advantages of having low power consumption and taking a low area. In a 3D architecture, more real estate is available and a comparator with an adjustable threshold can be implemented to obtain a better SPTR [137
]. One of the contributions to the timing jitter is given by the ratio of the signal noise over the slope of the signal at the discrimination point [125
]. Being able to set the threshold at the optimal value is the advantage of a comparator-based architecture over the inverter. Considering this, in a 3D PDC optimized for large area detectors where the power consumption is a critical requirement, an inverter is preferred. On the other hand, for an SPTR optimized 3D PDC for PET, a comparator should be implemented [48
Multiple comparator-based QCs have been developed to optimize the timing jitter. In [137
], they report that excellent SPTR (35
FWHM) can be obtained if the avalanche is detected at low levels while the multiplication process is still confined to the photoelectric interaction point. Following that, multiple SPAD-QC pairs with a comparator were developed to reach an SPTR of 27
] and finally
]. The adjustable threshold is also handy for characterization purposes: scanning the threshold over the input dynamic range provides information on SPAD voltage breakdown variations and the risetime of the SPAD signal [125
]. For example, studies show that the excess voltage of a single CMOS 65 nm SPAD can vary up to 30 mV FWHM [125
] and up to 60 mV FWHM for a CMOS 180 nm SPAD [141
]. This provides important information: the QC must have low propagation delay variation as a function of the SPAD signal amplitude variations as it gets convolved with its timing jitter [125
]. Additionally, in an array configuration, the QCs must have uniform routing propagation delay because any variation gets convolved with the system timing jitter. If the SPAD address is recorded, the digital signal processing unit of the ASIC can correct for the routing propagation delay variation and align the mean of the timing spectrum of each channel using calibration data stored in an on-chip lookup table (as discussed in Section 6.6.1
Each QC should include two monostable circuits: one that implements the hold-off delay used to minimize afterpulsing, and a second that gates the QC output until the SPAD is fully recharged to prevent spurious triggering and ensure a more uniform SPAD excess voltage while operating. Finally, the possibility to enable or disable each SPAD of a PDC opens doors to new characterization methods not possible with analog SiPM. For instance, one will be able to study the impact of optical crosstalk as a function of the relative position with respect to the emitting SPAD or characterize a single SPAD within an array [55
6.5. Time-to-Digital Converter
Many detectors in radiation instrumentation require the time of interaction and to obtain it, multiple circuits can be implemented. For a rough estimate, a counter on a clock signal can provide the basic information needed. For very high timing precision, a TDC allows for precise measurement between a start and a stop signal. Key parameters of a TDC are the timing jitter, the least significant bit (LSB), the area, the power consumption and the conversion time.
In the field of radiation instrumentation, there is a trend to reach sub-10 ps FWHM timing jitter for applications such as PET imaging using prompt photons [129
], ToF computed tomography [128
] and time-resolved calorimetry. It is quite a challenge to achieve a TDC with such an LSB and jitter while having a small area and a low power consumption, two parameters required in large systems such as a PET scanner.
The TDC LSB represents the smallest time step the TDC can measure and has a direct impact on the timing jitter through the quantization error (LSB/
RMS). For ToF-PET, the sub-10 ps FWHM timing jitter sets the LSB to about 5 ps, limiting the types of TDC architecture that can be implemented. The conversion time of the TDC must be as short as possible to minimize its dead time, the time interval during which it cannot timestamp another event. Please note that this can be mitigated in certain applications such as ToF-PET. For example, during the time a TDC is converting its timing information to a digital format, the circuit can still count the number of times the SPAD-QC is triggered, hence minimizing the impact of the device dead time and providing a way to count all photons (e.g., for energy measurements) [48
Limited by real estate, most 2D architectures rely on queuing theory to implement time-to-digital conversion within groups of SPADs [142
]. Although this was shown to be functional, the ultimate coincidence timing resolution (CTR) can only be met by using one TDC per SPAD or per small group of SPAD-QC pairs to maximize the probability to timestamp the firsts prompt photons [144
]. In theory, one TDC per SPAD-QC seems the best choice since it allows for correction of the timing skew of each pixel of a SPAD-QC-TDC. However, it comes with many challenges such as limited area (<
) and power consumption per TDC (< 100
If one TDC per SPAD-QC pair is required, it must be relatively small to fit alongside the QC and other circuits under the SPAD real estate with the same footprint. Combining this requirement with sub-100
W, 10 ps FWHM jitter and 5 ps LSB limits the TDC architectures that can be used. One of the prevalent architectures to reach both a small area and a small LSB is the single stage Vernier architecture [146
]; however, work is still required to reach the 4 combined key parameters. For example, Vernier ring oscillator TDCs can achieve low area (<
) and low power consumption (< 22
W) with a
ps LSB, providing a timing jitter of about 13 ps FWHM [133
]. The same architecture is implemented in an array of 256 TDCs per 1
to achieve a one-to-one coupling and the resulting timing jitter for the whole array is about 40 ps FWHM (18 ps RMS). This timing jitter degradation is mainly due to non-uniformities between TDC LSBs and common mode noise injection from the number of TDCs running on-chip [48
]. To minimize the common mode noise, one could implement one TDC per small group of SPAD-QC pairs and still apply corrections for the timing skew of each SPAD by adding an auxiliary circuit such as an arbiter to identify the SPAD that triggered [48
]. The challenge is to design an arbiter with sub-10 ps timing precision with multiple inputs (i.e., the number of SPADs linked to a single TDC). That being said, the brute force approach of providing one TDC per SPAD is most likely excessive for many applications and a careful study must be performed to identify the right ratio of SPADs per TDC to meet the specifications. For example, if timing is not critical (>250 ps) for a specific application (e.g., nEXO), then there are other TDC architectures or other TDC implementation schemes that can be considered. This does not discard the other advantages of the 3D architecture: optimizing the number of SPADs per TDC frees real estate for TDC improvements and advanced signal processing.
With respect to timing resolution, 3D integration allows for a SPAD of ideal geometry (particularly a uniform current collection) to be integrated at a short and uniform distance from a low jitter quenching circuit (e.g., comparator-based) with a stabilized low jitter TDC and further allows for signal processing to use address-based calibration data to compensate timestamp non-uniformity.
6.6. Digital Signal Processing
As pixels are getting smaller and systems (scanners or physics experiments) keep growing in size or density, the amount of data to gather and process is also becoming an issue. With 3D PDCs, various signal processing can be implemented at the sensor level to filter, select, condition and compress the data to limit the required bandwidth, and hence save on the system power consumption.
In 2D, the digital signal processing will require some area on the periphery of the SPAD-QC array therefore losing system level photosensitive fill-factor. This is more easily mitigated in a 3D PDC because signal processing is done by the CMOS tier under the SPAD array. For instance, if the QC and the TDC combined area is smaller than the SPAD area, digital signal processing and data transmission circuits can be distributed in each pixel of the ASIC, not impacting on the system level fill-factor.
The embedded digital signal processing is tailored to an application and can be very simple such as photon counts within a time window, or highly sophisticated such as multi-timestamp estimator with inline correction. To exemplify this, two application cases will be explored: (1) a preclinical/brain time-of-flight PET (ToF-PET) scanner with tight requirements on SPTR, and (2) a 3D PDC designed for a large-scale integration (multiple meter square) to operate at cryogenic LAr and LXe temperature.
6.6.1. Time-of-Flight PET Scanner
presents the block diagram of the digital signal processing required for PET imaging with the goal of eventually reaching sub-10 ps FWHM array wide [48
]. In ToF-PET, a burst of prompt photons followed by scintillation photons with a sharp rise time (sub-ns) and a long decay (tens of ns, depending on the crystal used) should be measured. The role of the readout is to timestamp the firsts prompt photons and to count all following photons during a given time interval. The timestamps are used to identify coincidence between PET events and the photon count gives the energy of the measured gamma to discriminate Compton events.
To prevent acquiring dark noise and therefore reducing the data bandwidth, a dark noise discrimination filter must be implemented; the acquisition is started only when a certain number of columns (programmable threshold) in the array have at least one trigger within 6 clock cycles [48
]. If the threshold is not met, all TDCs are reset to be ready for another event. Other implementations of dark count filters can be found in the literature [57
]. Filtering out dark noise is a major advantage of PDCs that reduces the output bandwidth and power consumption, knowing that a significant amount of power is lost and dissipated in the data transfer between electronic circuits.
When an event is detected, the whole array is then read out by the digital signal processing module, where the address, timestamp and number of counts are stored. To correct for the time skew between pixels and the TDC LSB non-uniformity throughout the array, a calibration is performed when initializing the ASIC and the correction values are stored on-chip in lookup tables [48
]. These would be difficult to embed in a 2D implementation since lookup tables require large on-chip area. Once the timestamps are corrected and stored, a sorting engine puts the timestamps in chronological order. At this point, a second dark count filter goes through the ordered timestamps and removes the timestamps before the event that are most likely due to dark counts and not photons. The dark count filter is used to increase the multi-timestamp time estimator precision [69
]. To complete the signal processing chain, a best linear unbiased estimator (BLUE) [69
] is used on the n
firsts photons to extract the timestamp of the 511 keV event. To estimate the improvement of this post-processing scheme, hardware-in-the-loop simulations were performed using a fabricated ASIC [48
] and simulated PET event (LYSO scintillator and SPAD) [149
] which showed a coincidence timing resolution (CTR) improvement from 160 ps FWHM to 126 ps FWHM and a bandwidth reduction from 34.7 Mbit/s to 0.5 Mbit/s [69
As for the energy of the event, it was shown that the same timestamps can be used to discriminate between a real event and a Compton event [150
]. The number of emitted photons by the scintillator increases with the energy deposited. Assuming that the rise time and decay time of the scintillator are independent of energy, the amplitude of the signal increases with the number of detected photons. Hence, the time interval between two detected photons is smaller for higher energy events, and oppositely the time interval will increase if the PET event has less energy. To discriminate in energy, the time interval
between the first detected photon and the photon of rank k
is used. This is compared to a programmable threshold
. If the time interval
is greater than
, then the event is discriminated in energy as a Compton event and rejected, but if the time interval
is smaller than
, then the event in conserved. All the signal processing described above is performed on-chip.
6.6.2. Liquid Argon and Liquid Xenon Experiments
Some neutrino [96
] and dark matter [13
] experiments rely on the measurement of faint scintillation photons in large volumes of noble liquids. They thus require a large area photodetector system operated at cryogenic temperatures capable of counting single photons. Indeed, 10 to 10,000 photons are to be measured per event and collected over many square meters of detectors [66
]. The scintillation light wavelengths of LXe and LAr are in the VUV range (175 nm and 125 nm respectively), but can be converted to visible light with the use of wavelength shifters (mainly in LAr experiments). Another requirement is the reduction of contaminants (mainly organic) that affect the light yield and the secondary electron lifetime in experiments with time projection chamber (TPC) configurations (e.g., nEXO). These experiments are also labelled as “low-background” because great care is taken to select all construction materials to limit spurious decays in a specific range of energies. Having the detector directly in the scintillation medium makes this constraint even more stringent on radio purity. Fortunately, silicon has been shown to comply with this requirement.
LAr and LXe experiments are real technological and economic drivers for 3D PDCs due to their large scale (nEXO, DUNE, ARGO) [10
]. As many functionalities are similar for LXe and LAr experiments, one could design a 3D PDC based on the same CMOS process, 3D integration approach and SPAD array that would support the various experiments worldwide. Still, this approach permits having tailored digital signal processing. However, in all cases, as the 3D PDC would be installed within the noble liquid, one of the main design criteria is low power consumption, one reason being to avoid convection in the time projection chamber.
For example, one can design a readout ASIC prototype that operates in two modes dedicated respectively to the nEXO experiment [96
] and to the ARGO experiment. In Figure 14
, a block diagram is shown of the ASIC fabricated in TSMC 180 nm BCD (Bipolar-CMOS-DMOS) process. The ASIC readout implements an inverter-based quenching circuit per SPAD. Programmable hold-off and recharge delays are provided for the whole array. When a SPAD is triggered, the information can be read by means of 3 output types: (1) a fast, low jitter, global OR-tree flag signal that provides the means to timestamp the triggering event with an external TDC; (2) an analog monitor (see Figure 9
); and (3) a digital data processing unit detailed below for the two applications.
The dedicated nEXO (LXe) mode focuses on power consumption reduction by asynchronous operation: when there are no photons, the power consumption is below 65 W (static power consumption and leakage) for a 5 × 5 array. Each 3D PDC continuously monitors all SPADs and sends a digital flag to an external tile (group of devices) controller when a SPAD is triggered. The controller can count the number of triggered SPADs in a programmable time window (typically 200 ) using the 3D PDC flag signal. When this number exceeds a programmed threshold, a clock will be sent temporarily to execute the digital sum and to enable this data exchange between each 3D PDC and the tile controller, keeping digital switching to a minimum. Please note that by knowing the limited number of photons emitted and the large volume of the experiment, it is likely that the detected photons will be distributed among many 3D PDCs and a decision to read out all channels would come from the external DAQ system. If an event occurs close to the photodetector tiles, then many 3D PDCs within a given area will count many photons, justifying the flexibility of this functionality.
The LAr mode is designed to perform pulse shape discrimination (PSD). Pulse shape discrimination in LAr makes use of the difference in scintillation decay time between nuclear recoil events and electronic recoil background events. Event discrimination is based on the fraction of light detected in the first tens of nanoseconds of an event with respect to the whole event duration (∼
). It allows for rejection of otherwise dominant backgrounds from beta and gamma radiation at the
]. In LAr mode, the SPAD-QC cells still asynchronously perform the avalanche monitoring, quenching, hold-off and recharge phases. The difference with the nEXO mode is that a clock sets a fast frame rate (10 ns minimum). The digital adder result is pushed into a 128-bit deep FIFO that stores the number of counts in bins of configurable width (multiples of the 10 ns clock period). When an event is identified by the tile controller and signaled back to the 3D PDCs, the device stores a set number of short bins then switches to storing longer time bins for the rest of the event duration. All these are configurable parameters. This gives access to the relevant signal waveform to perform PSD. In addition, the low jitter OR-tree signal can be used with a TDC to timestamp the event. A virtual fiducial volume can thus be created using time-of-flight, provided that the peak-to-peak timing jitter lies between 200 ps and 1 ns. The TDC could be implemented in the 3D PDC or in the tile controller.
6.8. Tiles for Large-Scale Detector in Cryogenic Operation
To build large-scale detectors, the 3D PDCs need to be assembled into tiles, grouped units of 10
]. The tiles must efficiently pave the detector to maximize the photosensitive area, provide on-site signal conditioning and send data to the external data acquisition system. Experiments such as nEXO can only tolerate a low radioactive background and a low number of impurities which prohibits the usage of PCBs made of FR4 or any common organic laminated materials [66
]. To improve the reliability and minimize the induced stress at cryogenic temperature, the coefficient of thermal expansion (CTE) of the tile should be closely matched to the 3D PDC’s CTE [159
]. Interposers are good candidates to replace the functionalities of a PCB in low radioactive cryogenic particle physics experiments. Most of the commercially available technologies limit the tile area below 10
. With the large interest in 2.5D/3D electronics, interposers are a hot topic in the microelectronics industry. Indeed, they offer a level of integration between a PCB and an integrated circuit [161
]. Glass and quartz interposers have great interest by the research community because of the high substrate isolation that increases RF capabilities compared to lossy silicon substrates [162
]. Even if glass and quartz substrates are good candidates for RF transmission, large area commercial interposer applications are limited with those materials [164
]. Fabrication over silicon substrates is a well-understood process by the microelectronics industry which allows the fabrication of small silicon interposers with multiple redistribution layers (RDLs). As for cryogenic operations, the usage of silicon as the tile core allows for matching of the CTE with the 3D PDCs.
To minimize complexity, yield and development risks, passive interposers are a baseline technology. However, it is most likely that future requirements (5–10 years) would benefit from built in active circuits. For instance, these active interposers could embed voltage regulators, digital-to-analog converters, line drivers, signal equalizers, decoupling capacitors or even phase-locked loops. Active interposers can support distributed electronics without reducing the circuit density. The development of a large area multi-RDL silicon interposer, be it passive or active, could very well provide the system integration necessary to create the required 3D PDC tile for particle physics experiments.
6.9. Implementation Challenges
Implementing 3D PDCs comes with its load of challenges. As the SPAD and the CMOS are not implemented in the same fabrication processes, one needs to prototype both independently. The CMOS technology must be chosen early in the project according to the excess voltage to be quenched and the digital signal processing circuit requirements, but also according to 3D assembly constraints. Then, the readout circuit can be fabricated by itself to validate all functionalities and performances without the SPAD array bonded to it. To have realistic input test signals, SPADs can be integrated directly into the readout CMOS technology. This is the opportunity to develop all the ancillary electronics needed to communicate, configure and control the readout electronics prior to having a 3D device.
The challenge for the development of the SPAD array layer of a 3D PDC made in a custom process is the unavailability of transistors in the layer, and hence no quenching circuit to test them. There is no straightforward way to infer all the SPAD characteristics (i.e., in Geiger mode) from DC measurements, in particular timing resolution. To overcome this issue, we have designed a dedicated integrated circuit (dubbed “ChipProbe” [141
]) with arrays of quenching circuits made to readout SPADs that are connected either in a flip-chip configuration or by wirebonds as shown in Figure 15
. The quenching circuits have been specially designed to overcome the added capacitance that the SPAD-to-ChipProbe interconnection brings. They allow for the study of SPADs at various excess voltages and threshold levels to explore all relevant parameters for PDE, noise and timing performance. All this accounts for a relatively long design cycle before having a 3D PDC (or any other imaging sensors in 3D).
By basing the device development on established process flows, recipes, capabilities and tools from the partner foundry, key risks must be identified and mitigation for them must be provided. For example, short fabrication loops are used to assess the existence of issues with the process and to provide solutions if needed. It is imperative to have strong ties with a foundry as the process flow will be based mainly on the foundry’s recipes, capabilities and tools available. Also, compared to the CMOS which can be prototyped at relatively low cost through MPW runs, developing a SPAD array in a commercial foundry requires significant funding. Another variable is the time required. Even with decent funding, it is not trivial to be a high priority project when compared to other commercial products being fabricated in the foundry.