A Pixel Design of a Branching Ultra-Highspeed Image Sensor

A burst image sensor named Hanabi, meaning fireworks in Japanese, includes a branching CCD and multiple CMOS readout circuits. The sensor is backside-illuminated with a light/charge guide pipe to minimize the temporal resolution by suppressing the horizontal motion of signal carriers. On the front side, the pixel has a guide gate at the center, branching to six first-branching gates, each bifurcating to second-branching gates, and finally connected to 12 (=6×2) floating diffusions. The signals are either read out after an image capture operation to replay 12 to 48 consecutive images, or continuously transferred to a memory chip stacked on the front side of the sensor chip and converted to digital signals. A CCD burst image sensor enables a noiseless signal transfer from a photodiode to the in-situ storage even at very high frame rates. However, the pixel count conflicts with the frame count due to the large pixel size for the relatively large in-pixel CCD memory elements. A CMOS burst image sensor can use small trench-type capacitors for memory elements, instead of CCD channels. However, the transfer noise from a floating diffusion to the memory element increases in proportion to the square root of the frame rate. The Hanabi chip overcomes the compromise between these pros and cons.


Introduction
In 1991, Etoh developed a video camera with a parallel readout high-speed image senor of 4500 fps [1]. In response to requests from users of this camera for a much higher frame rate, Etoh and Takehara proposed an image sensor concept with in-pixel multiple analogue storage elements, targeting 30 Mfps in 1992 [2]. The paper also proposed an inpixel video trigger concept which automatically stops a continuous overwriting operation upon a sudden change in a captured image. In 1996, Kosonocky et al. achieved 0.5 Mfps with an image sensor with in-pixel series-parallel-series (SPS) CCD storage [3]. He named such an image sensor with in-pixel storage for ultra-highspeed imaging "a burst image sensor". In 2001, Etoh et al. reached 1 Mfps for the first time with an in-situ storage image sensor with a slanted linear CCD for each pixel [4]. In 2018, Etoh et al. developed a multiframing camera with the frame interval ∆t = 10 ns (equivalent to 100 Mfps) [5]. When a chip with pixel-based driver circuits is stacked on the sensor chip with a 3D-stacking technology, the frame rate will reach 1 Gfps (∆t = 1 ns) [6]. Kuroda  large number of frames when the 3D-stacking is applied [7]. It took 26 years to go beyond the frame rate of 30 Mfps initially targeted in 1992. The theoretical highest frame rate of silicon image sensors is 90.1 Gfps (∆t = 11.1 ps) [8]. Recently, Ngo et al. proposed an image sensor concept with a germanium photodiode to achieve a frame rate higher than the theoretical limit of silicon image sensors. They named it super-temporal resolution (STR) image sensor [9]. Burst image sensors and STR imaging technologies achieved by other scientists are summarized in literature [9][10][11][12], including several historically important achievements [13][14][15][16][17][18][19][20].
A fundamental problem of burst image sensors is a trade-off between pixel count and frame count. Since the imaging area of an image sensor is fixed by standard formats, a large number of in-pixel storage elements for a large frame count increases the size of a pixel, reducing the pixel count. A 3D-stacking technology enables many storage elements in a stacked memory chip. However, a signal transfer rate from the sensor chip to the memory chip has another trade-off, this time, with signal transfer noise, since the noise level is proportional to the square root of the transfer rate.
To mitigate these tradeoffs, we propose a backside-illuminated (BSI) CCD/CMOS hybrid branching image sensor called Hanabi. The sensor has radially branching CCD channels and multiple CMOS readout circuits surrounding the channels as shown in Figure 1. (1) the hexagonal branching gates allow design of many storage elements in a proper balance of the areas of the gates, and (2) when the crystal orientation of the backside surface is (100), proper etching creates (100) surfaces on each of the walls of the square guide pipe.
While the configuration with mixed shapes makes the design complicated, we gave priority to better performance over saving the labors for the design.
The cascade branching begins from a guide gate (GG), hexa-furcating to first-branching gates (FG), each bifurcating to second-branching gates (SG), each connected to a floating diffusion (FD) for storage and readout of signal electrons, and further connected to a reset gate (RS) and a reset drain (RD).
The gates (electrodes) are systematically named as shown in Figure 1b. The firstbranching gates are named FG1 to FG6, and also named FGs as a whole. Second-branching gates connected to FG1 are named SG11 and SG12; the first digit refers to the FG to which the second-branching gates are connected, and the second digit is numbered clockwise. Second-branching gates are named SGs as a whole. Likewise, the floating diffusions (FDs) and reset gates (RSs) are numbered. However, for burst imaging, numbering RSs is not necessary since signals are reset at once.  Figure 2 shows the cross-sections of BSI image sensors with a functional circuit layer on the front side which we previously proposed in [5]. Three technologies, a charge collection pyramid, a light/charge guide pipe, and a p-well have been proposed for (1) collection of signal electrons to the center of the pixel and (2) potential separation of the upper photodiode layer and the lower functional circuit layer. Hereafter, a light/charge guide pipe and a charge collection pyramid are shortened to a guide pipe and a pyramid.
The p-well separation was applied to the development of the following: (1) a burst image sensor which achieved the frame interval of 10 ns, and (2) an image signal accumulation image sensor (ISAS) which captures 1220 consecutive images at the frame interval of 40 ns, and can accumulate image signals by repeating image capturing of reversible events with very low incident light [20].
The guide pipe and the pyramid significantly increase frame rates [5], whereas the circuits under the structures are the same. The pyramid has a fill factor of 100% without an on-chip micro-lens. However, the process of the image sensor with the pyramid is still under development. On the other hand, the guide pipe can be made with existing process Hanabi will bring innovations to advanced scientific measurement technologies, including fluorescence lifetime imaging microscopy (FLIM), and imaging time-of-flight mass spectrometry (imaging TOF MS). In a pump-probe analysis, for example, an ultra-short laser pulse is applied to a specimen to excite electron states, and a phenomenon triggered by the laser shot is probed with a certain delay. Repeating the measurement for a stepwise change of the delay provides the temporal change of the phenomenon. The temporal resolution decreases to a femtosecond, depending on the width of the laser pulse. Hanabi will enable the measurement of the temporal change by one shot without repetition with the temporal resolution down to sub-nanoseconds.
This paper presents a feasibility study of the following aspects of Hanabi: (1) fundamental structure and operation schemes, (2) design conditions and criteria to evaluate basic performance such as frame rate, spatio-temporal crosstalk, and collision rate against walls of the guide pipe which is explained later. (3) evaluation of the basic performance related to the guide pipe and the front-side circuit layer, separately, by potential and Monte Carlo simulations, (4) overall performance for the combined structure of the guide pipe and the circuit layer.
The paper will also discuss super-temporal-resolution in Hanabi with a germanium photodiode, and finally summarize major contributions of this analysis.  Figure 1a shows the cross section of Hanabi, which is composed of an upper square silicon pipe, receiving light incident to the backside, and a lower circuit layer on the front side. Figure 1b shows the configuration of the elements on the front side. The sensor has multiple CCD branching gates radially arranged from the center of the pixel.

Overview of Branching
The planar shapes consist of a square for the guide pipe and a hexagon for the branching gates. It is easier to design a sensor composed of only square or hexagonal shapes. However, a preliminary analysis showed the advantages of the combined square and hexagonal structures as follows: (1) the hexagonal branching gates allow design of many storage elements in a proper balance of the areas of the gates, and (2) when the crystal orientation of the backside surface is (100), proper etching creates (100) surfaces on each of the walls of the square guide pipe.
While the configuration with mixed shapes makes the design complicated, we gave priority to better performance over saving the labors for the design.
The cascade branching begins from a guide gate (GG), hexa-furcating to first-branching gates (FG), each bifurcating to second-branching gates (SG), each connected to a floating diffusion (FD) for storage and readout of signal electrons, and further connected to a reset gate (RS) and a reset drain (RD).
The gates (electrodes) are systematically named as shown in Figure 1b. The firstbranching gates are named FG 1 to FG 6 , and also named FGs as a whole. Second-branching gates connected to FG 1 are named SG 11 and SG 12 ; the first digit refers to the FG to which the second-branching gates are connected, and the second digit is numbered clockwise. Second-branching gates are named SGs as a whole. Likewise, the floating diffusions (FDs) and reset gates (RSs) are numbered. However, for burst imaging, numbering RSs is not necessary since signals are reset at once. Figure 2 shows the cross-sections of BSI image sensors with a functional circuit layer on the front side which we previously proposed in [5]. Three technologies, a charge collection pyramid, a light/charge guide pipe, and a p-well have been proposed for (1) collection of signal electrons to the center of the pixel and (2) potential separation of the upper photodiode layer and the lower functional circuit layer. Hereafter, a light/charge guide pipe and a charge collection pyramid are shortened to a guide pipe and a pyramid. technologies. Therefore, in this paper, the guide pipe is employed to mainly study the feasibility and the performance of the multiple branching structures. In this study, the guide pipe is surrounded by a silicon dioxide insulation layer. The outside can be crystal silicon, polysilicon, or other suitable materials.
To avoid collision of signal electrons against the walls of the guide pipe, a layer near   The p-well separation was applied to the development of the following: (1) a burst image sensor which achieved the frame interval of 10 ns, and (2) an image signal accumulation image sensor (ISAS) which captures 1220 consecutive images at the frame interval of 40 ns, and can accumulate image signals by repeating image capturing of reversible events with very low incident light [20].
The guide pipe and the pyramid significantly increase frame rates [5], whereas the circuits under the structures are the same. The pyramid has a fill factor of 100% without an on-chip micro-lens. However, the process of the image sensor with the pyramid is still under development. On the other hand, the guide pipe can be made with existing process technologies. Therefore, in this paper, the guide pipe is employed to mainly study the feasibility and the performance of the multiple branching structures.
In this study, the guide pipe is surrounded by a silicon dioxide insulation layer. The outside can be crystal silicon, polysilicon, or other suitable materials.
To avoid collision of signal electrons against the walls of the guide pipe, a layer near each wall is to be negatively charged relative to the potential value in the bulk of the guide pipe. Several methods are available for creating the negative potential layer, including: (1) a thin boron-doped layer over the walls of the guide pipe, (2) a low voltage applied to the outside of the guide pipe, and, (3) a negatively charged thin layer over the insulation layer, made with Al 2 O 3 .
In this paper, a boron layer is employed. Some electrons still collide against the wall. Most of them are reflected to the inside, but some are trapped on the wall and later released or recombined with holes. The number of activated boron ions per wall area (the concentration multiplied by the thickness) was decided to reduce the collision probability to an acceptable level or better as shown later. In this paper, it is assumed that an electron once colliding with the wall is absorbed, which is a safe assumption for a conservative design.

Equivalent Travel Routes of Signal Electrons
Let a collecting FD C and waiting FD W s respectively denote the FD which collects signal electrons and other FDs waiting for their turn. The sensor includes twelve SGs and twelve FDs. Most of signal electrons arrive at GG. However, some fall on FGs and SGs due to the random motion. All these electrons flow down from eleven low-voltage (upstream) SGs toward a high-voltage (downstream) SG, and finally to FD C . Therefore, 132 (=12 × 11) electron travel routes form the upstream SGs to the FD C . However, if the planar pixel configuration is symmetric, all the routes are represented by one equivalent route. For the combination of the square guide pipe and the hexagonal gate, the number of equivalent routes increases as shown in Figure 3: (1) from the upstream SGs to GG: there are three representative routes, R A , R B and R C for electrons to converge to GG, as color-coded in Figure 3b; for example, the green routes from SG 21 (R B ), SG 32 , SG 51 and SG 62 to GG are equivalent, since all these routes start from the corner of the guide gate, where the numbers of the gates are shown on FDs in Figure 3a, and (2) from GG to FD C through an FG and an SG downstream: there are three representative routes, R D , R E and R F , for the collected electrons to verge, as shown in Figure 3c.
Therefore, there are nine (=3 × 3) different equivalent routes. As shown later, slight differences among the downstream representative routes causes a serious difference on arrival times of electrons.

Operation Schemes
The sensor has two operation modes, burst imaging and continuous imaging. In burst imaging mode, upon photon detection, signal electron packets are stored in twelve FDs, which are converted to voltage signals and read out of the sensor or transferred to a stacked memory chip all at once after completion of the capture of a finite sequence of frames. In this mode, the frame count is only 12, however, one can bin multiple neighboring pixels to make a larger 'macro pixel' with the objective of increasing the frame count. For example, an interlace operation, in which image signals are alternately captured with even-column pixels and odd-column pixels doubles the frame count to 24, sacrificing the pixel resolution and the sensitivity in half. And macro pixel with four adjacent pixels quadruplicates the pixel count to 48 and makes the resolution 1/4.
The burst imaging mode has the following advantages: (1) when the sensor is cooled, the buried CCD channels theoretically enable a perfect and noiseless signal transfer even at ultrafast signal transfer, if the field is less than the critical field at which the drift velocity is saturated, and (2) noiseless signal accumulation is possible for repetitive burst imaging of reproducible events emitting very weak light: Alternatively, continuous imaging mode by Hanabi with a stacked memory chip has the following advantages: (1) a large frame count, and (2) the transfer rate of signal electrons from each FD to in-situ storage on the stacked chip can be reduced by 1/12 of that of an image sensor with one FD per pixel, reducing the transfer noise by 1/√12.
However, a large number of wires are necessary to independently operate FDs and reset gates in addition to the CCD electrodes. The number of wires for the reset gates reduces in alternate continuous imaging, in which image signals stored in the twelve FDs on the odd-number columns are simultaneously transferred to the stacked memory chip during successive image capturing by the pixels on the even-number columns, and vice versa. In a pixel, a similar operation can be introduced by grouping the FDs.

Theoretical and Practical Temporal Resolutions
We proposed the definitions of the theoretical temporal resolution and the practical temporal resolution , assuming a Gaussian distribution for the arrival time distribution of signal electrons to a detection plane. They are defined by means of the standard

Operation Schemes
The sensor has two operation modes, burst imaging and continuous imaging. In burst imaging mode, upon photon detection, signal electron packets are stored in twelve FDs, which are converted to voltage signals and read out of the sensor or transferred to a stacked memory chip all at once after completion of the capture of a finite sequence of frames. In this mode, the frame count is only 12, however, one can bin multiple neighboring pixels to make a larger 'macro pixel' with the objective of increasing the frame count. For example, an interlace operation, in which image signals are alternately captured with even-column pixels and odd-column pixels doubles the frame count to 24, sacrificing the pixel resolution and the sensitivity in half. And macro pixel with four adjacent pixels quadruplicates the pixel count to 48 and makes the resolution 1/4.
The burst imaging mode has the following advantages: (1) when the sensor is cooled, the buried CCD channels theoretically enable a perfect and noiseless signal transfer even at ultrafast signal transfer, if the field is less than the critical field at which the drift velocity is saturated, and (2) noiseless signal accumulation is possible for repetitive burst imaging of reproducible events emitting very weak light: Alternatively, continuous imaging mode by Hanabi with a stacked memory chip has the following advantages: (1) a large frame count, and (2) the transfer rate of signal electrons from each FD to in-situ storage on the stacked chip can be reduced by 1/12 of that of an image sensor with one FD per pixel, reducing the transfer noise by 1/ √ 12.
However, a large number of wires are necessary to independently operate FDs and reset gates in addition to the CCD electrodes. The number of wires for the reset gates reduces in alternate continuous imaging, in which image signals stored in the twelve FDs on the odd-number columns are simultaneously transferred to the stacked memory chip during successive image capturing by the pixels on the even-number columns, and vice versa. In a pixel, a similar operation can be introduced by grouping the FDs.

Theoretical and Practical Temporal Resolutions
We proposed the definitions of the theoretical temporal resolution t T and the practical temporal resolution t P , assuming a Gaussian distribution for the arrival time distribution of signal electrons to a detection plane. They are defined by means of the standard deviation σ of the distribution, an average penetration depth δ of incident light with a certain wavelength to a photodiode material, and the length of the photodiode L.
The theoretical temporal resolution is defined as t T = 2σ for L = δ [8,9]. For the first condition, two peaks of a sum of ordinates of two Gaussian distributions are combined to one peak with no dip at the center. Therefore, it is impossible to separately detect the signal electron packets which are generated by two pulses of light with the time difference of t T incident to the backside, and spread by diffusion and mixing toward the detection plane. The second condition, L = δ, means that the thickness of the photodiode is equal to the average penetration depth.
However, for these conditions, overlap of the two Gaussian distributions results in a large temporal crosstalk, and 36.8% (=e −1 ) of incident light passes through the front side, and is wasted.
On the other hand, the practical temporal resolution is defined as t P = 3.3σ for L = 3δ. When the arrival time distribution is approximated by the Gaussian distribution, the first condition is equivalent to t P = t 95 − t 5 , shown in Figure 4, where t 95 and t 5 are respectively the times when 95% and 5% of generated signal electrons arrive at the detection plane. Therefore, the temporal crosstalk is 5%. For L = 3δ, 95% of the incident light is converted to signal electrons. In this paper, the practical temporal resolution t P is employed to evaluate the sensor design.
Sensors 2021, 21, x FOR PEER REVIEW 6 of 20 deviation of the distribution, an average penetration depth of incident light with a certain wavelength to a photodiode material, and the length of the photodiode .
The theoretical temporal resolution is defined as = 2 for = [8,9]. For the first condition, two peaks of a sum of ordinates of two Gaussian distributions are combined to one peak with no dip at the center. Therefore, it is impossible to separately detect the signal electron packets which are generated by two pulses of light with the time difference of incident to the backside, and spread by diffusion and mixing toward the detection plane. The second condition, = , means that the thickness of the photodiode is equal to the average penetration depth.
However, for these conditions, overlap of the two Gaussian distributions results in a large temporal crosstalk, and 36.8% (= e −1 ) of incident light passes through the front side, and is wasted.
On the other hand, the practical temporal resolution is defined as = 3.3 for = 3 . When the arrival time distribution is approximated by the Gaussian distribution, the first condition is equivalent to ′ = 95 − 5 , shown in Figure 4, where 95 and 5 are respectively the times when 95% and 5% of generated signal electrons arrive at the detection plane. Therefore, the temporal crosstalk is 5%. For = 3 , 95% of the incident light is converted to signal electrons. In this paper, the practical temporal resolution ′ is employed to evaluate the sensor design.

Wavelengths of Incident Light and Fundamental Shape Parameters
The wavelengths of blue, green, and red light are assumed to be 450,550 and 650 nm. The wavelength of 550 nm for green light is close to that of the mercury e line, 546.07 nm which is used as a representative wavelength for visible light. Red light requires a longer photodiode, and, thus, a conservative design. Therefore, the green light and red light are used in the following analyses.
On the basis of a 120-nm process, fundamental shape parameters are determined as in Table 1. The layout of the front side shown in Figure 1b was drawn first based on our design experience and adjusted in a preliminary design with potential and Monte Carlo (MC) simulations. The pixel size is 11.4 μm × 11.4 μm. The design has an empty space beside a couple of reset drains. If these spaces are used for one drain for neighboring two FDs, the pixel size is reduced to about 10 μm. Furthermore, a checkerboard pixel arrangement rotated by 45 degrees can reduce the pixel pitch to 7.07 μm (= 10/√2 μm). If the imaging area is 11.4 mm × 11.4 mm and the pixel pitch is 11.4 μm, the pixel count reaches 1,000,000.
The size of the guide pipe is 5 × 5 μm 2 , and, thus, the fill factor is 19.2%. Therefore, an on-chip micro-lens is necessary. In the case, the direction of light is inclined toward the

Wavelengths of Incident Light and Fundamental Shape Parameters
The wavelengths of blue, green, and red light are assumed to be 450,550 and 650 nm. The wavelength of 550 nm for green light is close to that of the mercury e line, 546.07 nm which is used as a representative wavelength for visible light. Red light requires a longer photodiode, and, thus, a conservative design. Therefore, the green light and red light are used in the following analyses.
On the basis of a 120-nm process, fundamental shape parameters are determined as in Table 1. The layout of the front side shown in Figure 1b was drawn first based on our design experience and adjusted in a preliminary design with potential and Monte Carlo (MC) simulations. The pixel size is 11.4 µm × 11.4 µm. The design has an empty space beside a couple of reset drains. If these spaces are used for one drain for neighboring two FDs, the pixel size is reduced to about 10 µm. Furthermore, a checkerboard pixel arrangement rotated by 45 degrees can reduce the pixel pitch to 7.07 µm (= 10/ √ 2 µm). If the imaging area is 11.4 mm × 11.4 mm and the pixel pitch is 11.4 µm, the pixel count reaches 1,000,000. The size of the guide pipe is 5 × 5 µm 2 , and, thus, the fill factor is 19.2%. Therefore, an on-chip micro-lens is necessary. In the case, the direction of light is inclined toward the center. However, in the simulations, the light is assumed to be perpendicularly incident to the backside for a conservative design.
The total thickness of the sensor is assumed to be (3δ + 0.1 µm). The thickness of 3δ is for the vertical length of the photodiode, in which 95% of incident light is converted to electrons. The thickness of 0.1 µm is added for a p + (boron) layer for the backside hole accumulation layer. The total thickness was allocated to the length of the guide pipe and the thickness of the circuit layer.
The total thickness is 5.3 µm for green light and 12.1 µm for red light. Hereafter, the models with the total thicknesses are named Model M with a medium-sized photodiode, and Model L with a long photodiode. Table 2 shows the performance parameters aimed in the design. Since this paper shows a feasibility study of the branching image sensor, the acceptable values of the parameters are shown as well as the theoretical limits or ideal values. The values in the table are estimated, based on the current technologies and the trend in the scientific field as follows:

Design Targets
(1) Temporal resolution: the theoretical limit for silicon image sensors is 11.1 ps; however, as the existing highest temporal resolution of image sensors for visible light is about 10 ns, the resolution of 1 ns may be acceptable for a provisional target; in the case, the readout rate from each FD to a stacked memory chip is 6 ns (=1/12 × 2) for a correlated double sampling (CDS) operation, which may be achieved in the near future, (2) Frame count: fifty frames enable a replay of images at 10 fps for 5 s; the replay at 10 fps looks smoothly due to a saccade motion of our eyes with the lowest frequency less than 10 Hz; the duration of 5 s is at least necessary to activate dynamic recognition of our eyes; on the other hand, as televisions replay images at 30 fps (or 25 fps), a sequence of 300 frames enables replay for 10 (12) seconds for a standard TV, (3) Pixel count: the acceptable level may be 256 × 256 pixels (65, 536 pixels), which is rounded to 100,000 pixels; the pixel count for 1000 × 1000 is sufficient for most scientific researches, (4) E-field: the critical field, 25 kV/cm, is ideal, at which the drift velocity is 95% of the saturation drift velocity as shown in Figure 5 [9]; up to about 100 kV/cm, the drift velocity is almost constant, and dark current increases little, because excessive photon energy is mostly converted to a lattice vibration (generation of phonon or heat); on the other hand, at 5 kV/cm, the drift velocity reduces only to the half of the saturation drift velocity. (5) Other performance parameters: 10% may be acceptable. with one billion pixels is observing a wide sky. The transfer rate per readout signal line of the sensor is almost the same as that in advanced continuous-readout high-speed image sensors.

Destinations of Photo-Electrons
Most electrons generated by incident light directly move to the collecting FDC to create a signal electron packet, but others travel to different waiting FDws. Table 3 shows the destinations of generated electrons with their symbols.
In the guide pipe, some electrons collide with the wall or return to the backside due to the random motion. They are reflected or trapped at the wall. Trapped ones are released later or recombined with holes. The reflected ones move as normal electrons again. It is difficult to estimate the reflection ratio. Therefore, it is tentatively assumed that the reflection ratio is zero, i.e., electrons once colliding against the wall or the backside disappear. On the front side, most electrons surviving collisions are collected by the collecting gate. Others migrate to FDWs or to neighboring pixels. Some applications of ultra-high-speed imaging require much higher performance levels than the ideal levels in Table 2. For example, an ideal frame count is 4000 frames for an imaging TOF MS (time-of-flight mass spectrometry). In astronomy, an image sensor with one billion pixels is observing a wide sky. The transfer rate per readout signal line of the sensor is almost the same as that in advanced continuous-readout high-speed image sensors.

Destinations of Photo-Electrons
Most electrons generated by incident light directly move to the collecting FD C to create a signal electron packet, but others travel to different waiting FDws. Table 3 shows the destinations of generated electrons with their symbols.
In the guide pipe, some electrons collide with the wall or return to the backside due to the random motion. They are reflected or trapped at the wall. Trapped ones are released later or recombined with holes. The reflected ones move as normal electrons again. It is difficult to estimate the reflection ratio. Therefore, it is tentatively assumed that the reflection ratio is zero, i.e., electrons once colliding against the wall or the backside disappear. On the front side, most electrons surviving collisions are collected by the collecting gate. Others migrate to FD W s or to neighboring pixels.
For example, the collection rate R C and the migration rate R mg denote the portions of electrons arriving at a FD C and FD W s. The sum of the six factors in Table 4 is unity.

Guide Pipe
Hereafter, the sensor performance at the pixel level is optimized separately for the guide pipe and for the circuit layer. Later, the overall performance will be evaluated. This section discusses the guide pipe design and performance.
The average vertical E-field at the center of the guide pipe is fixed at 25 kV/cm. Then, the standard deviation of the travel time of electrons inside the guide pipe depends on the length L GP , the width W GP , and the concentration p GP of boron ions implanted in thin layers over the walls of the guide pipe. A narrow and tall guide pipe increases the collision probability of electrons to the wall. The higher boron concentration forces electrons to move toward the center, thus decreasing the collision probability, and, yet increasing the arrival time. Figure 6 shows the potential contours in cross-sections for (a) Model M, and (b) Model L for different boron concentrations from 0 to 4 × 10 16 cm −3 . Figure 7 shows the vertical potential profiles at the center of the pixel. The channel potential value is shown at the highest value close to the surface where the line is crooked. Figures 6 and 7 show that: (1) with no wall boron doping, the contours in the guide pipe are parallel, and the vertical potential profile is almost linear with the field close to the average 25 kV/cm, (2) with the boron doping, the contours near the walls bend upward so that the field is directed to the center to reduce the collision probability of electrons; the vertical potentials bend upward, decreasing the fields in the backside layer, and increasing those in the front-side layer, (3) for Model M with a high boron concentration of 4.0 × 10 16 cm −3 , the fields in the backside layer and the front-side layer are respectively15 kV/cm and 34.4 kV/cm which are still within the acceptable range shown in Table 2, (4) for Model L for the same boron concentration, the field in the backside layer is 1.6 kV/cm, which is too small, resulting in a high backside returning rate, and the field in the front-side layer 79.3 kV/cm, which is significantly large, resulting in a slight increase of dark current, and (5) a higher boron concentration for Model L loops the contours in the circuit layer as shown in circles in Figure 6(b-2),(b-3), hampering delivery of signal electrons to the storage areas, which requires detailed modifications in the design.
guide pipe and for the circuit layer. Later, the overall performance will be evaluated. This section discusses the guide pipe design and performance. The average vertical E-field at the center of the guide pipe is fixed at 25 kV/cm. Then, the standard deviation of the travel time of electrons inside the guide pipe depends on the length , the width , and the concentration of boron ions implanted in thin layers over the walls of the guide pipe. A narrow and tall guide pipe increases the collision probability of electrons to the wall. The higher boron concentration forces electrons to move toward the center, thus decreasing the collision probability, and, yet increasing the arrival time. Figure 6 shows the potential contours in cross-sections for (a) Model M, and (b) Model L for different boron concentrations from 0 to 4 × 10 16 cm −3 . Figure 7 shows the vertical potential profiles at the center of the pixel. The channel potential value is shown at the highest value close to the surface where the line is crooked.  (1) with no wall boron doping, the contours in the guide pipe are parallel, and the vertical potential profile is almost linear with the field close to the average 25 kV/cm, (2) with the boron doping, the contours near the walls bend upward so that the field is directed to the center to reduce the collision probability of electrons; the vertical potentials bend upward, decreasing the fields in the backside layer, and increasing those in the front-side layer,  Table 2, (4) for Model L for the same boron concentration, the field in the backside layer is 1.6 kV/cm, which is too small, resulting in a high backside returning rate, and the field in the front-side layer 79.3 kV/cm, which is significantly large, resulting in a slight increase of dark current, and (5) a higher boron concentration for Model L loops the contours in the circuit layer as  (1) the collision rate R col efficiently decreases for the wall boron concentration p GP from 0 to 1.0 × 10 16 cm −3 , and slightly reduces and stays less than 8% for both green and red light, which satisfies the acceptable level of 10%, (2) for Model M, the standard deviations σ is less than 20.0 ps; for Model L, σ is less than 30 ps for p GP < 1.0 × 10 16 cm −3 , but, suddenly increases to more than 200 ps at p GP = 4.0 × 10 16 cm −3 , and

Modulation Functions
The voltages applied to gates are estimated by using modulation functions, i.e., the relations between the gate voltages and the channel potentials at the gates. A systematic procedure to estimate the gate voltages is proposed. Model L is used in the following explanation. The same procedure can be applied to the gate voltages of Model M. Figure 9a,b shows the positions where the modulation functions are evaluated, which are the centers of GG, FG, and SG. The pinning gate voltages are about -6 V. When the gate voltages are above the pinning condition, the modulation functions linearly increase as well known. The modulation factors are 0.88 for GG, and 0.82 for FG, decreasing to 0.65 at SG due to the narrow channel effect. Pinning voltages and the corresponding potential values for the gates slightly change due to the narrow channel effect for the same dopant condition over the whole area. Table 5 summarizes the modulation functions.  Table 4 summarizes the collision rate and the temporal resolutions in the optimum case.

Modulation Functions
The voltages applied to gates are estimated by using modulation functions, i.e., the relations between the gate voltages and the channel potentials at the gates. A systematic procedure to estimate the gate voltages is proposed. Model L is used in the following explanation. The same procedure can be applied to the gate voltages of Model M. Figure 9a,b shows the positions where the modulation functions are evaluated, which are the centers of GG, FG, and SG. The pinning gate voltages are about −6 V. When the gate voltages are above the pinning condition, the modulation functions linearly increase as well known. The modulation factors are 0.88 for GG, and 0.82 for FG, decreasing to 0.65 at SG due to the narrow channel effect. Pinning voltages and the corresponding potential values for the gates slightly change due to the narrow channel effect for the same dopant condition over the whole area. Table 5 summarizes the modulation functions.    Figure 10a conceptually shows an ideal channel potential profile from FDw to FDc. The ideal horizontal potential profile linearly increases from an upstream (low-voltage) SG toward a downstream (high-voltage) SG. From the channel potential value and the modulation function at each gate, the gate voltage is uniquely decided as follows: (1) assume the amplitude of the SG voltage is 3 V, and the pinning voltage is −6 V, and, then, the low and high voltages of the SG are −6 V-3 V, (2) from the modulation function of SG, the channel potentials at an upstream SG and a downstream SG are −3.73 V and −1.78 V, (3) the gate voltages of GG and upstream and downstream FGs are interpolated from Figure10a as −5.19 V and −3.84 V, and, then, the amplitude of FG is 1.35 V.
In Table 6, and Figure 10b, the estimated potential values are compared with those calculated by a potential simulation with the gate voltages. The errors are 0.9% to 13.8%, which proves a high efficiency of the proposed simple estimation method of the gate voltages.
The voltage amplitude of FG, 1.35 V, may be allowable for semiconductor circuits driven at a very high frequency. The voltage amplitude of SG, 3 V, is higher than that of FG. However, the operation rate of SG can be 1/6 or 1/12 of that of FG, and the gate area of one SG is only 45% (1.14 μm 2 /2.49 μm 2 ) of that of FG. Therefore, if FG can be driven at 1.35 V amplitude, SG with a 3 V amplitude can be easily operated.
The average field is 3.05 kV/cm (= 1.95 V/6.38 μm).  Figure 10a conceptually shows an ideal channel potential profile from FDw to FDc. The ideal horizontal potential profile linearly increases from an upstream (low-voltage) SG toward a downstream (high-voltage) SG. From the channel potential value and the modulation function at each gate, the gate voltage is uniquely decided as follows:

Gate Voltages and Potentials of the Channels
(1) assume the amplitude of the SG voltage is 3 V, and the pinning voltage is −6 V, and, then, the low and high voltages of the SG are −6 V-3 V,   In Table 6, and Figure 10b, the estimated potential values are compared with those calculated by a potential simulation with the gate voltages. The errors are 0.9% to 13.8%, which proves a high efficiency of the proposed simple estimation method of the gate voltages. The voltage amplitude of FG, 1.35 V, may be allowable for semiconductor circuits driven at a very high frequency. The voltage amplitude of SG, 3 V, is higher than that of FG. However, the operation rate of SG can be 1/6 or 1/12 of that of FG, and the gate area of one SG is only 45% (1.14 µm 2 /2.49 µm 2 ) of that of FG. Therefore, if FG can be driven at 1.35 V amplitude, SG with a 3 V amplitude can be easily operated.

Voltage Adjustment
(1) the average field is 3.05 kV/cm as calculated from the potential line in the figure, which is about 1/10 of the ideal field, 25 kV/cm, which reduces the drift velocity of electrons to 1/3 of the saturation drift velocity as shown in Figure 5, (2) the potential profiles at the centers of the gates are rather flat, further decreasing the field and increasing the travel time, and (3) at the downstream FG, there is a slight potential dip as shown with a red circle, which seriously increases the travel time of electrons passing through the area.
On the other hand, the figure provides suggestions for modifications of the potential profile as follows: (1) a fine semiconductor process: if the same voltage amplitude is applied, a fine process increases the average field inversely proportional to the pixel size, and, in addition, portions of a steep fringe field erode the flat portions, and (2) voltage modifications: if the potentials at the downstream FG is slightly lowered, and SG is raised, the small potential dip may disappear; for the (−0.1 V) decrease for the FG and 0.3 V increase for the SG, the modulation functions suggest that gate voltages must be shifted by −0.12 V (=−0.1/0.82 V) for the FG, and 0.46 V (=0.3/0.65 V) for the SG.
Then, the potential dip disappears. The voltage amplitudes of the FG and the SG are slightly modified to 1.2 V and 3.5 V.

Detailed Adjustment of Temporal Resolution for Potential Barrier
As shown in Figure 3, there are three equivalent electron routes along the upstream SGs to GG; R A , R B and R C , and three equivalent routes from GG to downstream FD C ; R D , R E and R F . As shown in Figure 10b, the field around the upstream SG and FG is high, creating a relatively high drift velocity. On the other hand, the field along the routes in the downstream FG is relatively low even after adjustment to eliminate the slight potential dip. The downstream route is represented by one of the equivalent routes, R D , R E and R F , in Figure 3. The potential profiles along these routes are slightly different around zero, which causes a large difference in travel times of electrons passing through these routes, and seriously unbalances the arrival times of electrons depending of the routes.
This may seem a local problem. The important role of this paper is not only to show general aspects, but also to find these potential risks which may cause serious problems in development of the prototype and to provide ways to solve them. Therefore, the travel times for the routes R D , R E and R F are carefully evaluated to be shortened and balanced by MC simulations. The position of the electron source is at 1 µm above the front surface at the center of the pixel to separately evaluate the performance in the circuit layer from that in the guide pipe. Table 7 shows the results. Table 7. Travel times from the center of GG to FDc before and after detailed modification of a phosphorus mask: electrons start from 1.0 µm above the front side at the center of the pixel, * the value is used in Section 4.3.2.

Temporal Criteria
Before The ratio of the average arrival times of 231 ps and 838 ps of the routes R D and R E ps is 27.6%. Figure 11(a-1) shows the plane potential profiles of the channels and Figure 11(a-2) shows the potential profiles along the routes. The route R E shows a small dip between the downstream FG to SG, which caused the large difference in the temporal resolutions.

Fundamental Performance
The overall performance is evaluated for the combined structure with the guide pipe and the circuit layer. Green light represents visible light. In a theoretical analysis, it is appropriate to use Model M for the corresponding wavelength. However, red light incident to Model M in Table 4 cannot be fully absorbed and generates false signal electrons in the circuit layer. Therefore, Model L with incident green light is selected for safe-side evaluation in practice. Electrons are generated by incident light to the backside. The results are summarized in Table 8, showing: (1) all major performance indices satisfy the acceptable values listed in Table 2; except the frame count, (2) for example, the temporal resolution is about 386 ps, which is much shorter than the acceptable level, 1 ns; (3) about 88% of electrons are collected by a collecting FDC; actually, some portions of electrons colliding against the walls of the guide pipe and the backside are reflected To mitigate the unbalance in the travel times, fine modifications are applied to the design of a phosphorus mask as shown in Figure 11b. The improved potential profiles on the circuit layer, and the improved travel times are shown in Figure 11(c-2) and Table 7. Table 7 shows that the travel times from the center of the pixel on the front side to the FDs are 212 ps to 231 ps, which improve the ratio of the balance to 91.8%.

Fundamental Performance
The overall performance is evaluated for the combined structure with the guide pipe and the circuit layer. Green light represents visible light. In a theoretical analysis, it is appropriate to use Model M for the corresponding wavelength. However, red light incident to Model M in Table 4 cannot be fully absorbed and generates false signal electrons in the circuit layer. Therefore, Model L with incident green light is selected for safe-side evaluation in practice. Electrons are generated by incident light to the backside. The results are summarized in Table 8, showing: (1) all major performance indices satisfy the acceptable values listed in Table 2; except the frame count, (2) for example, the temporal resolution is about 386 ps, which is much shorter than the acceptable level, 1 ns; (3) about 88% of electrons are collected by a collecting FD C ; actually, some portions of electrons colliding against the walls of the guide pipe and the backside are reflected and join the collected electrons; therefore, the real collection rate increases to more than 90%. The frame count is 12 for the burst imaging, and can be increased to 48 for a macropixel operation with four neighboring elemental pixels, sacrificing the sensitivity to 1/4. In the near future, the continuous imaging can be introduced by transferring image signals from 12 FDs in turn to a stacked memory chip, keeping a reasonable noise level by reducing the transfer rate with the multiple FDs. Figure 12 shows example trajectories of the signal electrons generated with MC simulations for the modified design of Model L and green light of 550 nm. Light is perpendicularly applied to the backside. The figures show that:

Visual Diagonosis with MC Simulations
(1) as shown in Figure 12a, most of signal electrons generated by incident light forms a bundle, which is effectively squeezed to the center of guide pipe to minimize the collision rate, (2) most of them fall onto the guide gate and transferred to FD C , (3) however, some electrons take extraordinary long travel times to increase the temporal resolution, as shown with A and B, and (4) few of them migrate to FD W as shown with C.

Metal Wiring and 3D Stacking of Driver Chip
This paper has mainly discussed a potential design and temporal resolution analyses. There are many other factors relating to the feasibility of an image sensor developed with these new concepts. For Hanabi, a large number of metal wires increases the chip area. A narrow contact pitch to stack a driver chip on the sensor chip seriously decreases yield.
Burst imaging with Hanabi requires at least 34 metal contacts, including 19 (= 1 + 6 + 12) gate electrodes and 12 signal lines from the FDs. From our experience, the minimum pixel size restricted by the number of wires is approximately estimated by (the process node) × 3 × (the number of metal contacts). For the current layout with a 120-nm process, the product makes 12.24 μm, which is close to 11.4 μm of the pixel size shown in Table  8. Therefore, the number of wires does not seriously restrict the feasibility of the burst imaging mode, but may require a slightly large pixel. More wires are necessary to expand the concept to the continuous imaging Hanabi, which requires a larger pixel size or a finer process.
The driver circuit is installed in a stacked bottom tier. A pixel of Hanabi has 21 highspeed gate electrodes. Therefore, the pitch of the Cu-Cu contacts in each pixel is 2.49 μm (= 11.4/√21), which can be processed with the existing technology. Additional discussions in this section have further confirmed the overall feasibility of Hanabi.

Higher Frame Rate: Toward Super Temporal Resolution
Theoretically, Hanabi can achieve noiseless imaging. The pixel count and the frame rate can be increased by downward scaling. The pixel count increases in inverse proportion to the square of the pixel size. Further increase of the frame rate is discussed here in relation to the scaling.
The standard deviation of the total arrival time from the generation site in the guide pipe to FDC is expressed as follows: (1) The example trajectory A is caused by electrons generated and wandering in a lowfield layer created by a thin boron doped layer at the backside. A and B are caught at a local no field on the front side. The problem at the backside layer may be mitigated with a negatively charged thin layer of Al 2 O 3 over the backside insulation layer replacing the thin boron layer. In the simulation, the direction of light is perpendicular to the backside. An on-chip micro-lens may efficiently suppress the collision against the wall and the migration to FD W s. MC simulations are very useful to visually diagnose local problems and modify the design.

Metal Wiring and 3D Stacking of Driver Chip
This paper has mainly discussed a potential design and temporal resolution analyses. There are many other factors relating to the feasibility of an image sensor developed with these new concepts. For Hanabi, a large number of metal wires increases the chip area. A narrow contact pitch to stack a driver chip on the sensor chip seriously decreases yield.
Burst imaging with Hanabi requires at least 34 metal contacts, including 19 (=1 + 6 + 12) gate electrodes and 12 signal lines from the FDs. From our experience, the minimum pixel size restricted by the number of wires is approximately estimated by (the process node) × 3 × (the number of metal contacts). For the current layout with a 120-nm process, the product makes 12.24 µm, which is close to 11.4 µm of the pixel size shown in Table 8. Therefore, the number of wires does not seriously restrict the feasibility of the burst imaging mode, but may require a slightly large pixel. More wires are necessary to expand the concept to the continuous imaging Hanabi, which requires a larger pixel size or a finer process.
The driver circuit is installed in a stacked bottom tier. A pixel of Hanabi has 21 highspeed gate electrodes. Therefore, the pitch of the Cu-Cu contacts in each pixel is 2.49 µm (=11.4 √ 21), which can be processed with the existing technology. Additional discussions in this section have further confirmed the overall feasibility of Hanabi.

Higher Frame Rate: Toward Super Temporal Resolution
Theoretically, Hanabi can achieve noiseless imaging. The pixel count and the frame rate can be increased by downward scaling. The pixel count increases in inverse proportion to the square of the pixel size. Further increase of the frame rate is discussed here in relation to the scaling.
The standard deviation σ T of the total arrival time from the generation site in the guide pipe to FD C is expressed as follows: where σ GP and σ CL are the contributions of the arrival times due to motions of signal electrons, in the guide pipe and in the circuit layer, respectively. The values of σ T and σ GP are shown in Tables 4 and 8 to be 125 ps and 29.2 ps, respectively. By substituting these values in (1), σ CL = 121.5 ps. Therefore, the temporal resolution is almost solely determined by the horizontal motion on the front side. The value is slightly larger than that for electrons starting at the pixel center, 113 ps, shown in Table 7. Further analysis is needed to eventually suppress the horizontal motion on the circuit layer increasing the total temporal resolution. There are two major causes of horizontal motion: (1) as shown in Figure 10b, the horizontal potential profile on the front side consists of flat parts in the middle of the gates, and high-field parts at the gate boudaries due to the fringing effect; the flat parts seriously elongate the standard deviation of the arrival time, and (2) as shown in Figure 12a, most electrons fall on the guide gate, and move horizontally; the different horizontal travel distance over the guide gate causes horizontal mixing.
Downward scaling mitigates both problems as follows: (1) for shorter gates, the high-field portions erode the flat parts, linealizing the whole horizontal potential profile, and (2) for a narrower guide gate, the horizontal mixing effect reduces in proportion to the size.
Potential simulations show that the horizontal potential profile becomes almost linear with no flat portion for a 1/3 model of the current one. Then, the 1.65-µm guide gate in the current layout shrinks by about 0.5-µm. Then, the standard deviation σ CL is calculated by the MC simulation for signal electrons uniformly distributed all over the guide gate. The result is as follows: (1) σ CL = 8.5 ps < σ GP = 29.2 ps (the standard deviation in the upper silicon guide pipe).
(2) the theoretical temporal resolution for the horizontal motion on the front side is 2σ CL = 17.0 ps, which is close to the theoretical temporal resolution limit of the silicon photodiode 11.1 ps [8].
Most image sensors are based on silicon technology. Therefore, we named imaging technology that overcomes the silicon limit on the frame interval, 'the super-temporalresolution (STR) imaging'. The STR Hanabi is within the scope of realistic research by replacing the photodiode material from silicon to germanium [9].
Germanium has some fundamental problems in the context of a possible integration in the Hanabi technology, such as a very high dark current. However, the dark current generated in 11.1 ps is 1/10 11 of the dark current generated in one second. Other problems, including the instability of the oxide for the insulation layer, are currently being addressed [23][24][25].

Conclusions
A burst image sensor is a standard technology for ultra-highspeed imaging, using either CCD or CMOS technologies. A BSI branching image sensor with a stacked driver chip is proposed to find a compromise between their pros and cons. A pixel of the sensor is composed of an upper photoelectron conversion layer to converge generated charges to the center, and a lower circuit layer to cascade the charges through radially branching channels to multiple storage units. Since such a sensor structure has not existed nor been tested, the technical feasibility of the sensor is confirmed with simulations and additional considerations.
First, a design strategy specific to the sensor structure is proposed and applied to a model case. The upper photoelectron conversion and the lower circuit layers are separately analyzed to solve associated problems, and, then, the overall performance is evaluated for the whole sensor structure.
Under very conservative design conditions with cooling, the sensor achieves theoretically noiseless imaging with the frame interval of less than 413 ps and the frame count of 12 to 48. In the near future, the frame count can be increased with a CMOS readout operation to a stacked memory chip, sacrificing the noiseless image capture.