Arbitrary Conﬁgurable 20-Channel Coincidence Counting Unit for Multi-Qubit Quantum Experiment

: This paper presents a 20-channel coincidence counting unit (CCU) using a low-end ﬁeld-programmable gate array (FPGA). The architecture of the CCU can be conﬁgured arbitrarily to measure from twofold to twentyfold coincidence counts thanks to a multifold controllable architecture, which can be easily manipulated by a graphical user interface (GUI) program. In addition, it provides up to 20 of each input signal count simultaneously. The experimental results show twentyfold coincidence counts with the resolution occurring in a less than 0.5 ns coincidence window. This CCU has appropriate characteristics for various quantum optics experiments using multi-photon qubits.


Introduction
A coincidence counting unit (CCU) is a module that can count the coincidences of two or more electrical inputs. CCUs are widely used in experiments on quantum optics [1][2][3][4][5][6][7][8][9][10], quantum communication [11][12][13][14][15], and the measurement of radioactive isotopes, among other applications [16]. Due to the advancement of techniques in all fields, there is increasing demand for CCU devices with more functions. Specifically, the experiments that use multiple photons for quantum computing require a CCU which can detect multifold coincidences with a large number of input channels [17,18]. Moreover, basic experiments for revealing the correlation of entangled photons need a lot of multifold coincidence outputs and single input signals from the photon detectors simultaneously. Therefore, the development of a low-cost, high-performance CCU with a large number of input channels which can detect multifold coincidences is more important for recent experiments utilizing multiple photons.
Historically, analog CCUs, called time-to-amplitude converters (TACs), have been used for radiation metrology to exactly measure the activity of a sample [19]. However, the channel capacity of TACs was difficult to expand since the analog circuits for the input were complex and bulky. To overcome this issue, time-to-digital converters (TDCs) were proposed for the multi-photon experiments [20][21][22][23][24]. These components precisely measured the timing differences between the start and stop signals, which was especially beneficial for radar and positron emission tomography. However, for multifold coincidences, it needed post-processing, because it could only measure twofold coincidences. Meanwhile, a CCU with digital circuits can address this issue, and it has additional features such as a small chip size, scalable characteristics, a high detection rate, and a low cost. The CCUs utilizing emitter-coupled logic (ECL) [25] and transistor-transistor logic (TTL) [26,27] were reported. Furthermore, those IC chip implementations can be realized using a single field-programmable gate array (FPGA) [28][29][30][31][32][33][34]. The CCU based on an FPGA has a lot of advantages. The input signals can be simultaneously processed with a parallel operation. In addition, a programmable feature ensures flexibility via changing configurations for adapting to the specific experiment. The coincidence time window, which is a range of coincidence detection, can be easily controlled by reprogramming the FPGA or reconfiguring its architecture. It is useful for specific experiments evaluating the quantum correlation from photons to atoms. The coincidence time window also has valuable characteristics, those being few-ns [28][29][30]33] and sub-ns [31,32]. In multi-photon experiments, the small coincidence time window has the benefit of noise reduction due to utilizing a lot of detectors, and the sub-ns resolution of the coincidence time window satisfies quantum optics experiments which consider jitters from the single-photon detectors. Although an FPGA-based CCU cannot achieve the high resolution of the coincidence window compared with the analog one, its superior characteristics make it a prevalent type of CCU architecture these days. With those advantages and powerful functions, the CCU was applied, especially for various quantum experiments, including our group's research [7][8][9][10]14,15]. A 48 channel CCU using an FPGA was developed for multi-photon experiments with up to sixfold coincidence measurement [32]. It showed that quantum experiments using multi-photon entanglement or multiple degrees of freedom of a single photon needed a CCU that had many inputs and outputs for multifold coincidence measurements.
In this paper, we present a CCU that can detect up to twentyfold coincidences using a low-cost FPGA. It simultaneously provides 20 input signal counts and 20 coincidence counts. Compared with our previous version [31], this upgrades not only the number of inputs and outputs, but also the flexibility for adapting to various experiments through efforts focused on architecture optimization for equivalent delays of the signal critical path, the input signal's pulse shaping, and so on. In addition, the CCU can arbitrarily configure its architecture to measure the coincidences of every combination of 20 inputs using multiplexers (MUXs). This reconfiguration is easily controllable and can be set at the moment when the user starts to operate it by using a graphical user interface (GUI). The count accumulation time for single inputs and coincidences can be expanded to the user's needs by improving the storage methods. Moreover, the user-friendly GUI displays all the input and output counts. We developed a CCU using an FPGA and other peripherals, such as ports and a communication chip, which can be chip-sized for portable devices.
The remainder of this paper is organized as follows. Section 2 explains the CCU architecture in detail. In Section 3, we discuss the experimental details and their results. In particular, the characteristics of a reconfigurable architecture and the coincidence window are discussed in detail. Section 4 summarizes and concludes our work. Figure 1 shows the overall architecture of the CCU. The input signals are manipulated for counting the coincidences of inputs with the functional blocks, such as the internal delay, pulse reshaping, coincidence signal generator, and counter. The internal delay block, composed of delay buffers and an MUX, allows the user to adjust the delays of electronic input signals. This is a convenient function that users can apply for various experimental setups that need different detection timings. The input signals can be delayed by approximately 0.7 ns per step and are maximally delayed up to 100 ns. The pulsereshaping block consists of a JK-flip flop (JK-FF), buffers, an EX-OR gate, and a MUX, as shown in Figure 2a. The pulse-reshaping block is essential for coincidence counting because it controls the coincidence time window using a pulse-reshaping technique, which is described in Figure 2b-d for three cases. In Figure 2b, we describe the general operation of the pulse-reshaping block. Comparing 1 (input) and 4 (output), it is easy to figure out that the input pulse width is shortened. The signals of 2 and 3 are the output signal of the JK-FF and the delayed signal by the buffers, respectively. With the EX-OR operation using those signals, the input pulse is reshaped for a small coincidence time window. Our pulse-reshaping block is also affordable for increasing the coincidence time window by making the input pulse width wider (see Figure 2c). However, if the input pulse repetition rate is faster than the delay time of the buffer, an unintentional output can be generated, as described in Figure 2d. Therefore, the maximum delay time of the buffer should be determined by considering the maximum input repetition rate. The coincidence time window can be controllable by modifying the pulse-reshaping steps from 2 to 30, which is the number of buffers the signal passes through. The coincidences are detected by an AND gate with a reshaped pulse of the inputs at the coincidence signal generator block. The amount of overlapping reshaped pulses produces coincidence signals. The counter block simply counts 20 single channels and 20 coincidence signals simultaneously. After a user-defined time for data accumulation, the processor block collects 40 counts of data and transfers them to a personal computer (PC) through a USB port. cs 2021, 10, x FOR PEER REVIEW 3 of 10 those signals, the input pulse is reshaped for a small coincidence time window. Our pulsereshaping block is also affordable for increasing the coincidence time window by making the input pulse width wider (see Figure 2c). However, if the input pulse repetition rate is faster than the delay time of the buffer, an unintentional output can be generated, as described in Figure 2d. Therefore, the maximum delay time of the buffer should be determined by considering the maximum input repetition rate. The coincidence time window can be controllable by modifying the pulse-reshaping steps from 2 to 30, which is the number of buffers the signal passes through. The coincidences are detected by an AND gate with a reshaped pulse of the inputs at the coincidence signal generator block. The amount of overlapping reshaped pulses produces coincidence signals. The counter block simply counts 20 single channels and 20 coincidence signals simultaneously. After a userdefined time for data accumulation, the processor block collects 40 counts of data and transfers them to a personal computer (PC) through a USB port.

Figure 1.
Overall architecture of the coincidence counting unit (CCU). All the required functional blocks are implemented in a low-end field-programmable gate array (FPGA). The processor controls all the functional blocks and transfers data to a personal computer via a USB connection. All the parameters can be controlled with the graphical user interface (GUI) software.
(a) (b) (c) (d) Figure 1. Overall architecture of the coincidence counting unit (CCU). All the required functional blocks are implemented in a low-end field-programmable gate array (FPGA). The processor controls all the functional blocks and transfers data to a personal computer via a USB connection. All the parameters can be controlled with the graphical user interface (GUI) software.
Electronics 2021, 10, x FOR PEER REVIEW 3 of 10 those signals, the input pulse is reshaped for a small coincidence time window. Our pulsereshaping block is also affordable for increasing the coincidence time window by making the input pulse width wider (see Figure 2c). However, if the input pulse repetition rate is faster than the delay time of the buffer, an unintentional output can be generated, as described in Figure 2d. Therefore, the maximum delay time of the buffer should be determined by considering the maximum input repetition rate. The coincidence time window can be controllable by modifying the pulse-reshaping steps from 2 to 30, which is the number of buffers the signal passes through. The coincidences are detected by an AND gate with a reshaped pulse of the inputs at the coincidence signal generator block. The amount of overlapping reshaped pulses produces coincidence signals. The counter block simply counts 20 single channels and 20 coincidence signals simultaneously. After a userdefined time for data accumulation, the processor block collects 40 counts of data and transfers them to a personal computer (PC) through a USB port.

Figure 1.
Overall architecture of the coincidence counting unit (CCU). All the required functional blocks are implemented in a low-end field-programmable gate array (FPGA). The processor controls all the functional blocks and transfers data to a personal computer via a USB connection. All the parameters can be controlled with the graphical user interface (GUI) software. If the number of inputs is just expanded, the performance of each input channel may be severely different. Therefore, for supporting a 20 channel and multifold CCU with each channel's identical high performance, we made considerable efforts to implement two significant blocks. The first was a pulse-reshaping block. In Figure 2a, the red lines are the critical path of the CCU for reshaping the pulse width of the input signal. Each input channel has a pulse reshaping block, such that there are as many critical paths of the pulse-reshaping blocks as the number of inputs. The placement of the buffers in the pulse r-shaping blocks is different; consequently, the critical paths of each input are different. For this reason, the reshaped input pulse can have varying pulse widths for different inputs. In order to overcome this undesirable characteristic, we used a logic lock region, which is the function of the FPGA programming tool. With this function, we can place the buffers at the designated region in the FPGA and minimize the difference between buffer delays. All functional blocks of the CCU were locked, and their placements were optimized in the FPGA. As a result, the statistical differences of the signal delays were minimized.
Second, to generate the coincidence signal of arbitrarily selected inputs among the 20 inputs, we developed the coincidence signal generator shown in Figure 3. Using a 20 input AND gate and 20 MUXs, users can arbitrarily select inputs to generate its coincidence signal. For identical performance as an arbitrary configuration, we designed and synthesized the critical path lines carefully. When the user chooses to disable the input, the VCC (high level) is fed in as an input signal of the AND gate instead of the original input signal by using a select input of the MUX. Furthermore, 20 sets of the coincidence signal generator block, consisting of an AND gate and 20 MUXs, generate 20 coincidence signals simultaneously. Each input channel is connected with 20 MUXs of each coincidence signal generator block separately for duplicating the input signals. The duplicated input pulses are distributed among 20 AND gates for generating 20 independent coincidence outputs. Therefore, users can arbitrarily select the combination of inputs for 20 coincidence outputs independently. If the number of inputs is just expanded, the performance of each input channel may be severely different. Therefore, for supporting a 20 channel and multifold CCU with each channel's identical high performance, we made considerable efforts to implement two significant blocks. The first was a pulse-reshaping block. In Figure 2a, the red lines are the critical path of the CCU for reshaping the pulse width of the input signal. Each input channel has a pulse reshaping block, such that there are as many critical paths of the pulsereshaping blocks as the number of inputs. The placement of the buffers in the pulse rshaping blocks is different; consequently, the critical paths of each input are different. For this reason, the reshaped input pulse can have varying pulse widths for different inputs. In order to overcome this undesirable characteristic, we used a logic lock region, which is the function of the FPGA programming tool. With this function, we can place the buffers at the designated region in the FPGA and minimize the difference between buffer delays. All functional blocks of the CCU were locked, and their placements were optimized in the FPGA. As a result, the statistical differences of the signal delays were minimized.
Second, to generate the coincidence signal of arbitrarily selected inputs among the 20 inputs, we developed the coincidence signal generator shown in Figure 3. Using a 20 input AND gate and 20 MUXs, users can arbitrarily select inputs to generate its coincidence signal. For identical performance as an arbitrary configuration, we designed and synthesized the critical path lines carefully. When the user chooses to disable the input, the VCC (high level) is fed in as an input signal of the AND gate instead of the original input signal by using a select input of the MUX. Furthermore, 20 sets of the coincidence signal generator block, consisting of an AND gate and 20 MUXs, generate 20 coincidence signals simultaneously. Each input channel is connected with 20 MUXs of each coincidence signal generator block separately for duplicating the input signals. The duplicated input pulses are distributed among 20 AND gates for generating 20 independent coincidence outputs. Therefore, users can arbitrarily select the combination of inputs for 20 coincidence outputs independently.  Moreover, the GUI program and hardware of the CCU were developed to easily control the configuration of the 20 coincidence outputs. Compared with the previous version, massive system parameters have to be handled due to the 20 inputs and outputs. The parameters of the GUI program are transferred to the functional blocks in an FPGA through a USB connection, as shown in Figure 4. The parameters of the input delay and coincidence time window, which are transferred to the internal delay and pulse-reshaping block, respectively, are controllable in real time with numeric up-down controllers. The parameters of the 20 input combination for coincidence signal generation is controllable with 20 check boxes. Certainly, there are 20 groups of 20 check boxes in the GUI for counting 20 coincidences simultaneously. The accumulation time for the signal count is also adjustable using a numeric up-down controller. The accumulated counts-20 single input counts and 20 coincidences-are concurrently displayed in the textbox. Furthermore, all the data can be stacked up in the GUI program for the user's needs, which further depends on the computing power of the user.

Coincidence Signal Generator
Moreover, the GUI program and hardware of the CCU were develop trol the configuration of the 20 coincidence outputs. Compared with the p massive system parameters have to be handled due to the 20 inputs an parameters of the GUI program are transferred to the functional bloc through a USB connection, as shown in Figure 4. The parameters of the i coincidence time window, which are transferred to the internal delay and p block, respectively, are controllable in real time with numeric up-down parameters of the 20 input combination for coincidence signal generation with 20 check boxes. Certainly, there are 20 groups of 20 check boxes in the ing 20 coincidences simultaneously. The accumulation time for the sign adjustable using a numeric up-down controller. The accumulated countscounts and 20 coincidences-are concurrently displayed in the textbox. F the data can be stacked up in the GUI program for the user's needs, which f on the computing power of the user. The overall architecture of the CCU was implemented in an FPGA w elements (LEs), which is 78% of the usage of a Cyclone4 FPGA chip (EP from Intel FPGAs, according to a Quartus II compilation report. The rest o be used for debugging and additional delay for the electronic input signa information of the resource utilization for the CCU is stated in Table 1. Al the CCU were packaged in a module at a size of 240 × 175 × 50 mm 3 . In Fi ports and a USB port for data transmission are placed at front and back sid of the packaged box. For experiments demanding more than 20 inputs, the CCU could be made affordable by exchanging the upper version of an F channels of the CCU consumed only 10% of the LEs with the Stratix fami tionally, the CCU could be developed with an Artix, Kintex, or Virtex fam different vendors because it only used less than 20,000 conventional log not special elements.  The overall architecture of the CCU was implemented in an FPGA with 17,476 logic elements (LEs), which is 78% of the usage of a Cyclone4 FPGA chip (EP4CE22F17C6N) from Intel FPGAs, according to a Quartus II compilation report. The rest of the LEs could be used for debugging and additional delay for the electronic input signals. The detailed information of the resource utilization for the CCU is stated in Table 1. All components of the CCU were packaged in a module at a size of 240 × 175 × 50 mm 3 . In Figure 5, 20 input ports and a USB port for data transmission are placed at front and back sides, respectively, of the packaged box. For experiments demanding more than 20 inputs, the function of the CCU could be made affordable by exchanging the upper version of an FPGA. In fact, 20 channels of the CCU consumed only 10% of the LEs with the Stratix family FPGA. Additionally, the CCU could be developed with an Artix, Kintex, or Virtex family FPGA from different vendors because it only used less than 20,000 conventional logic elements and not special elements.

Experiments
To evaluate the 20 channel CCU, we conducted experiments for recogni specific features. First, we investigated the range of acceptable input frequencie serve the performance, we simultaneously put two synchronized TTL pulses at ports of the CCU with a 3 ns coincidence time window. Figure 6 shows the avera ing rates as a function of the input frequency. Due to the synchronized input p coincidence count rates were the same as the single input count rates. The measur count and the coincidence count rates were equal to the input frequency until frequency reached close to 400 MHz. However, as the input frequency exceeded the coincidence counting rates grew abnormal compared with the input frequenc 600 MHz, the coincidence count was almost zero because the amplitude of the co signal became smaller than the acquisition level of the FPGA. Second, we analyzed the size of the coincidence time window. We gener synchronized TTL pulse sequences and ensured that one of the pulse sequences delayed to accurately measure the coincidence time window. We then put th sequences into each input channel of the CCU. While changing the timing betw pulse sequences, we plotted the ratio of the coincidence (see Figure 7a). We p noted that the pulse-reshaping block determined the coincidence time window a to the pulse reshaping steps. In Figure 7a, we used pulse reshaping steps of 2, for which the coincidence time windows were 0.46 ns, 1.73 ns, and 3.05 ns, res

Experiments
To evaluate the 20 channel CCU, we conducted experiments for recognizing four specific features. First, we investigated the range of acceptable input frequencies. To observe the performance, we simultaneously put two synchronized TTL pulses at the input ports of the CCU with a 3 ns coincidence time window. Figure 6 shows the average counting rates as a function of the input frequency. Due to the synchronized input pulses, the coincidence count rates were the same as the single input count rates. The measured single count and the coincidence count rates were equal to the input frequency until the input frequency reached close to 400 MHz. However, as the input frequency exceeded 400 MHz, the coincidence counting rates grew abnormal compared with the input frequency. Above 600 MHz, the coincidence count was almost zero because the amplitude of the coincidence signal became smaller than the acquisition level of the FPGA.

Experiments
To evaluate the 20 channel CCU, we condu specific features. First, we investigated the range o serve the performance, we simultaneously put two ports of the CCU with a 3 ns coincidence time wind ing rates as a function of the input frequency. Due coincidence count rates were the same as the single count and the coincidence count rates were equal frequency reached close to 400 MHz. However, as t the coincidence counting rates grew abnormal com 600 MHz, the coincidence count was almost zero be signal became smaller than the acquisition level of  Second, we analyzed the size of the coincidence time window. We generated two synchronized TTL pulse sequences and ensured that one of the pulse sequences could be delayed to accurately measure the coincidence time window. We then put those pulse sequences into each input channel of the CCU. While changing the timing between two pulse sequences, we plotted the ratio of the coincidence (see Figure 7a). We previously noted that the pulse-reshaping block determined the coincidence time window according to the pulse reshaping steps. In Figure 7a, we used pulse reshaping steps of 2, 6, and 10, for which the coincidence time windows were 0.46 ns, 1.73 ns, and 3.05 ns, respectively. Furthermore, we estimated the coincidence time windows more precisely with accidental coincidence counts [35]. When asynchronous random Gaussian pulses are fed into two input channels, we can calculate the coincidence time window T C using the following equation: where R A , R B , and R AB are single counts of channels A and B and their accidental coincidence counts, respectively. Figure 7b shows the experimental results (dots) and the fitting data (lines). The coincidence windows with pulse reshaping steps of 2, 6, and 10 were set to 0.3 ns, 1.3 ns, and 2.8 ns, respectively. There were some negligible errors of the coincidence time window with the experimental results of Figure 7a,b, which resulted due to different shapes of the input pulses. We used a square wave for the experiment shown in Figure 7a, but Gaussian pulses were applied as inputs for the experiment shown in Figure 7b. where RA, RB, and RAB are single counts of channels A and B and their accidental coincidence counts, respectively. Figure 7b shows the experimental results (dots) and the fitting data (lines). The coincidence windows with pulse reshaping steps of 2, 6, and 10 were set to 0.3 ns, 1.3 ns, and 2.8 ns, respectively. There were some negligible errors of the coincidence time window with the experimental results of Figure 7a,b, which resulted due to different shapes of the input pulses. We used a square wave for the experiment shown in Figure 7a, but Gaussian pulses were applied as inputs for the experiment shown in Third, to verify the 20 coincidence outputs, we arbitrary generated four pulses for inputs of the CCU using a function generator. The four input pulses are depicted in Figure  8a. The frequencies of the input pulses were 10 MHz, 5 MHz, 10 MHz, and 10 MHz with pulse widths of 10 ns, 10 ns, 50 ns, and 25 ns, respectively. The input pulses were synchronized with each other. Figure 8b shows that the single counts for the four input pulses were accurately displayed in the GUI program. Obviously, the other single counts were all zero. To detect the coincidences for those inputs, as mentioned before, we could select a 20 input combination for coincidence signal generation using the check boxes. As selected in the configuration setting check boxes, 20 coincidence outputs were counted and shown in the GUI program. Because output text boxes 1-4 of the coincidence count were selected as a single channel, they became around 10 M or 5 M counts, respectively. Two and threefold coincidences were measured in the 5-16 output text boxes, which are not shown at all in the figure. Additionally, fourfold coincidences were measured in output text boxes 17, 18, 19, and 20. All single and coincidence counts were displayed correctly, which means that the 20 outputs of the CCU were operating properly.

Single counts
Single channel configuration Third, to verify the 20 coincidence outputs, we arbitrary generated four pulses for inputs of the CCU using a function generator. The four input pulses are depicted in Figure 8a. The frequencies of the input pulses were 10 MHz, 5 MHz, 10 MHz, and 10 MHz with pulse widths of 10 ns, 10 ns, 50 ns, and 25 ns, respectively. The input pulses were synchronized with each other. Figure 8b shows that the single counts for the four input pulses were accurately displayed in the GUI program. Obviously, the other single counts were all zero. To detect the coincidences for those inputs, as mentioned before, we could select a 20 input combination for coincidence signal generation using the check boxes. As selected in the configuration setting check boxes, 20 coincidence outputs were counted and shown in the GUI program. Because output text boxes 1-4 of the coincidence count were selected as a single channel, they became around 10 M or 5 M counts, respectively. Two and threefold coincidences were measured in the 5-16 output text boxes, which are not shown at all in the figure. Additionally, fourfold coincidences were measured in output text boxes 17, 18, 19, and 20. All single and coincidence counts were displayed correctly, which means that the 20 outputs of the CCU were operating properly. lected in the configuration setting check boxes, 20 coincidence outputs were counted and shown in the GUI program. Because output text boxes 1-4 of the coincidence count were selected as a single channel, they became around 10 M or 5 M counts, respectively. Two and threefold coincidences were measured in the 5-16 output text boxes, which are not shown at all in the figure. Additionally, fourfold coincidences were measured in output text boxes 17, 18, 19, and 20. All single and coincidence counts were displayed correctly, which means that the 20 outputs of the CCU were operating properly. Finally, we performed a test to confirm the twentyfold coincidence counting. We used an FPGA and a pulse generator for generating 19 synchronized signals and one timing-controlled signal, respectively. All signals were connected to the inputs of the CCU and reshaped with 3 steps at the pulse-reshaping block. By enabling all the inputs of the AND gate in the coincidence signal generator block, twentyfold coincidence signals were accumulated. We adjusted the signal timing of one input using a pulse generator with a high resolution of 10 ps while holding the 19 synchronized input pulses. We investigated all the channels and noticed that each channel had different coincidence time windows due to different placement of the buffers in the pulse-reshaping blocks. As expected, the buffer delays were minimized with the logic lock function. Despite this consideration, in Figure 9, the coincidence probabilities of the channels have distributions ranging from 1.5 ns to 2.4 ns. The experimental results for each channel are depicted as different colors. It is possible to correct this using additional hardware such as delay chips instead of the buffers in an FPGA. However, a delay chip consumes a lot of current and requires additional efforts to match the features of the inputs, such as electrical noise and the path length of the peripheral circuits.
Finally, we performed a test to confirm the tw used an FPGA and a pulse generator for generating ing-controlled signal, respectively. All signals were and reshaped with 3 steps at the pulse-reshaping bl AND gate in the coincidence signal generator block, accumulated. We adjusted the signal timing of one high resolution of 10 ps while holding the 19 synchr all the channels and noticed that each channel had due to different placement of the buffers in the puls buffer delays were minimized with the logic lock fu Figure 9, the coincidence probabilities of the channe ns to 2.4 ns. The experimental results for each chann is possible to correct this using additional hardwar buffers in an FPGA. However, a delay chip consum tional efforts to match the features of the inputs, s length of the peripheral circuits.

Discussion
We summarized the performance of our CCU with related works in Table 2. The TTL circuit-based CCU has 4 channels and can measure coincidences using all inputs with a relatively coarse coincidence resolution [27]. The FPGA-based CCUs have a relatively large number of input channels, but only a small number of inputs can be measurable for the coincidence [22,32,33]. The TDC-based work is also compared, which has two inputs and can obtain high resolutions of only twofold coincidences [24]. Comparing related works, the proposed CCU offers a maximum twentyfold coincidence measurement with a reasonable coincidence window and a high counting rate. With this CCU, we expect that multi-qubit experiments for quantum computing can be supported, and the correlation of the qubits is more easily detected and revealed. Moreover, the CCU can be built in chip scale for housing a portable device at a relatively lower cost.

Conclusions
We proposed a 20 channel CCU using a low-end FPGA. It has 20 inputs and 40 outputs for measuring up to twentyfold coincidences. The experimental results show that the measurement of a twentyfold coincidence was performed well, with less than 0.5 ns for the coincidence window. Since all the required functions are implemented in the FPGA, one can easily manipulate and arbitrarily configure the CCU for specific experiments using the GUI. With the consideration of the architectures for multifold coincidence, our CCU is ready to be employed in quantum experiments that utilize multi-qubit states.

Conflicts of Interest:
The authors declare no conflict of interest.