Direct Annihilation Position Classiﬁcation Based on Deep Learning Using Paired Cherenkov Detectors: A Monte Carlo Study

: To apply deep learning to estimate the three-dimensional interaction position of a Cherenkov detector, an experimental measurement of the true depth of interaction is needed. This requires signiﬁcant time and e ﬀ ort. Therefore, in this study, we propose a direct annihilation position classiﬁcation method based on deep learning using paired Cherenkov detectors. The proposed method does not explicitly estimate the interaction position or time-of-ﬂight information and instead directly estimates the annihilation position from the raw data of photon information measured by paired Cherenkov detectors. We validated the feasibility of the proposed method using Monte Carlo simulation data of point sources. A total of 125 point sources were arranged three-dimensionally with 5 mm intervals, and two Cherenkov detectors were placed face-to-face, 50 mm apart. The Cherenkov detector consisted of a monolithic PbF 2 crystal with a size of 40 × 40 × 10 mm 3 and a photodetector with a single photon time resolution (SPTR) of 0 to 100 picosecond (ps) and readout pitch of 0 to 10 mm. The proposed method obtained a classiﬁcation accuracy of 80% and spatial resolution with a root mean square error of less than 1.5 mm when the SPTR was 10 ps and the readout pitch was 3 mm.


Introduction
Clinical time-of-flight positron emission tomography (TOF-PET) scanners with 200-600 picosecond (ps) coincidence time resolution (CTR) have been developed [1][2][3]. In laboratory studies, CTR has approached~30 ps at the detector level [4,5]. TOF information determines the annihilation position on the coincidence line connecting the two interaction positions of γ-rays with detectors, with the position uncertainty ∆x ≈ c∆t/2, where c is the speed of light and ∆t is the CTR. For example, if ∆t = 200 ps, ∆x = 30 mm. Although the position uncertainty of 30 mm is not comparable to the spatial resolution of clinical PET scanners (4-6 mm), the signal-to-noise ratio of the reconstructed image can be improved depending on the CTR and patient size using the TOF information in the reconstruction process [3]. If we could develop a TOF-PET detector with 26-40 ps CTR, corresponding to a position uncertainty of 4-6 mm, annihilation points could be determined event-by-event. Therefore, the image reconstruction process, which can amplify the noise of PET images, could be removed from the clinical PET.
Recently, Cherenkov radiation-based TOF-PET detectors have been developed [6][7][8]. Because the emission of the Cherenkov photon is as fast as a few ps, the detection of Cherenkov photons could potentially approach 10 ps CTR [9]. However, to stop the γ-ray using the Cherenkov radiator, such as a monolithic lead fluoride (PbF2) crystal with the same detection efficiency as a lutetium-based scintillator, thicker crystal is needed, for example, 20 mm. As the radiator thickness increases, the depth of interaction (DOI) of γ-rays causes timing errors because of the propagation speed difference between γ-rays and photons in crystals [10][11][12]. Thus, to achieve an extremely good CTR on the order of 10 ps, both TOF and DOI information will be needed. However, as fewer than 30 Cherenkov photons are produced by the 511 keV γ-ray [9,13], DOI estimation becomes challenging. Hence, DOI estimation methods based on machine learning have been developed for Cherenkov detectors [8,14]. Although the machine learning approach enables the Cherenkov detector to estimate DOI, it requires a dataset equipped with correct DOI. The measurement of the correct DOI for the Cherenkov detector requires substantial time and effort. Some methods that can be used to avoid this problem have been developed in detectors based on a monolithic scintillator [15][16][17][18]. However, these methods could not be applied to the monolithic Cherenkov detector. From a different point of view, the cause of the problem could be that we try to indirectly estimate the annihilation position from the TOF and DOI. If we can estimate the annihilation position directly from the spatial and temporal photon distribution measured by a pair of Cherenkov detectors, there would be no need to estimate the DOI in the first place.
In this study, we propose a direct annihilation position classification based on deep learning using paired Cherenkov detectors. We estimate the annihilation position directly from the photon information measured by paired Cherenkov detectors using deep learning instead of intermediate estimation of TOF and DOI. Deep learning is a powerful tool that has attracted the interest of researchers in the medical imaging field [19][20][21]. Recently, deep learning has been used to process raw data bypassing image reconstruction, such as ghost cytometry [22] or sinogram-space machine learning [23]. In the same spirit, we use deep learning to bypass the intermediate estimation of DOI and TOF and estimate the annihilation position directly from the raw data of photon information measured by paired Cherenkov detectors. We performed a Monte Carlo simulation with 125 point sources and trained a deep neural network (DNN) predicting the point source from which the γ-ray pair originated based on the photon information captured by paired Cherenkov detectors. We evaluated a classification accuracy and point spread function (PSF) over various single photon time resolutions (SPTR) and readout pitches. The point source position can be determined easily even in the experiment so that the above problem concerning the measurement of correct DOI is eliminated. Figure 1 shows the paired Cherenkov detectors used in this study. The two Cherenkov detectors were placed face-to-face, 50 mm apart. The origin of the coordinate system, (x, y, z) = (0, 0, 0), was defined at the center of the detector plane for (x, y) and at the center between the pair of the detectors at the z-axis, as shown in Figure 1. For the Cherenkov radiator, we used a monolithic PbF 2 crystal with a size of 40 × 40 × 10 mm 3 in the (x, y, z) direction. The PbF 2 crystal is appropriate as a Cherenkov radiator because it has a high density, high refractive index, and transparency toward the UV region, as shown in Table 1 [24,25]. Each photodetector was topped with 300-µm-thick protection resin (n = 1.5), and the photodetector array was connected to a radiator by 100-µm-thick grease, as shown in Figure 1. All crystal surfaces were painted black to reduce the number of reflections, except the surface where a photodetector array was coupled. As stated in the previous section, because the number of Cherekov photons is small, the position-time information of all photons should be read out individually to fully utilize limited information.

Paired Detectors
For practical consideration, we modeled an SPTR of the photodetector as a Gaussian distribution of σ = 0 to 100 ps in intervals of 10 ps. The 10 ps SPTR is possible for MCP-PMT [26], but 50-100 ps is realistic for silicon photomultipliers (SiPM) [27]. Further, we changed the readout pitch of the photodetector in the range of 0 to 10 mm with an interval of 1 mm. The photodetection efficiency (PDE) was assumed to be 100 % for simplicity. The PDE does not affect whether the direct annihilation position classification is possible, although it affects the sensitivity of the system. The photon detection threshold was assumed to be five, except in the detection efficiency study, to confirm the feasibility of our proposed method under the same conditions as our previous studies [8,14], wherein the same photon detection threshold value was employed.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 14 photodetector in the range of 0 to 10 mm with an interval of 1 mm. The photodetection efficiency (PDE) was assumed to be 100 % for simplicity. The PDE does not affect whether the direct annihilation position classification is possible, although it affects the sensitivity of the system. The photon detection threshold was assumed to be five, except in the detection efficiency study, to confirm the feasibility of our proposed method under the same conditions as our previous studies [8,14], wherein the same photon detection threshold value was employed. were placed face-to-face, 50 mm apart. The origin of the coordinate system, (x, y, z) = (0, 0, 0), was defined at the center of the detector plane for (x, y) and at the center between the pair of the detectors at the z-axis. A monolithic PbF2 crystal with a 40 × 40 × 10 mm 3 Cherenkov radiator was coupled to a photodetector array through 300-μm-thick protection resin (n = 1.5) with grease (n = 1.5).

Simulation Dataset
A Monte Carlo simulation was performed to create a dataset using Geant4 [28] with the same physical parameters as in Ota et al. [8], except for the detector dimension. The Geant4 Livermore libraries were used for low-energy electromagnetic processes; both the photoelectric effect and Compton scattering were implemented according to Brunner et al. [29]. Geant4 has a maximum step length (MSL) parameter that balances the accuracy and computation time for particle tracking. For simulation with Cherenkov effects in the low-energy region, an appropriate setting of the MSL is important. We set the MSL to 1 μm. The optical properties of PbF2, for example, the transmittance, refractive index, and density, were determined based on the definition of Anderson et al. [24] and the Refractive Index Database [30]. Figure 1 shows a simulation setup of a paired Cherenkov detector and 125 training source positions. The two Cherenkov detectors were placed face-to-face, 50 mm apart. The origin of the coordinate system, (x, y, z) = (0, 0, 0), was defined at the center of the detector plane for (x, y) and at the center between the pair of the detectors at the z-axis, as shown in Figure 1. A total of 10 7 emissions of the γ-ray pair were performed at each source position, where x, y, and z were scanned from 0 to 10 mm in intervals of 5 mm. Thus, the 3 × 3 × 3 = 27 source positions in the first quadrant were simulated, as were placed face-to-face, 50 mm apart. The origin of the coordinate system, (x, y, z) = (0, 0, 0), was defined at the center of the detector plane for (x, y) and at the center between the pair of the detectors at the z-axis. A monolithic PbF 2 crystal with a 40 × 40 × 10 mm 3 Cherenkov radiator was coupled to a photodetector array through 300-µm-thick protection resin (n = 1.5) with grease (n = 1.5).

Simulation Dataset
A Monte Carlo simulation was performed to create a dataset using Geant4 [28] with the same physical parameters as in Ota et al. [8], except for the detector dimension. The Geant4 Livermore libraries were used for low-energy electromagnetic processes; both the photoelectric effect and Compton scattering were implemented according to Brunner et al. [29]. Geant4 has a maximum step length (MSL) parameter that balances the accuracy and computation time for particle tracking. For simulation with Cherenkov effects in the low-energy region, an appropriate setting of the MSL is important. We set the MSL to 1 µm. The optical properties of PbF 2 , for example, the transmittance, refractive index, and density, were determined based on the definition of Anderson et al. [24] and the Refractive Index Database [30]. Figure 1 shows a simulation setup of a paired Cherenkov detector and 125 training source positions. The two Cherenkov detectors were placed face-to-face, 50 mm apart. The origin of the coordinate system, (x, y, z) = (0, 0, 0), was defined at the center of the detector plane for (x, y) and at the center between the pair of the detectors at the z-axis, as shown in Figure 1. A total of 10 7 emissions of the γ-ray pair were performed at each source position, where x, y, and z were scanned from 0 to 10 mm in intervals of 5 mm. Thus, the 3 × 3 × 3 = 27 source positions in the first quadrant were simulated, as shown in the red rectangle of Figure 1. By utilizing symmetry, 98 source positions were replicated from those of the 27 source positions, as shown by the green dots in Figure 1. For example, by flipping the sign of x for the detected photon information, the simulation data of the source position (x, y, z) = (−5, 0, 0) can be generated from the simulation data of the source position (x, y, z) = (5, 0, 0). Thus, 125 (5 3 ) source positions were simulated in total. The momentum direction of the γ-ray pair was randomly determined against the polar and azimuth angles, which were varied from −π/4 to +π/4 and 0 to 2π, respectively. In all simulations, we recorded the following information event-by-event.
As the photon detection threshold was assumed to be five, a total of 227,005 coincidence events in which both detectors detected more than five Cherenkov photons were obtained. This number was only 0.02 % of all γ-ray pair emissions. We split this set to assign 75% for training and 25% for testing. To increase the number of coincidence events in the training dataset, we extracted the N C 5 × M C 5 events from coincidence events that were 6 ≤ N or 6 ≤ M. For example, in the case of N = 6 and M = 6, we could extract 36 coincidence events in which N = 5 and M = 5. For the test dataset, we extracted information for the first five photons from each single event, constructing the coincidence event that is 6 ≤ N or 6 ≤ M. It is known that if the number of data in each class is imbalanced, the performance of the classifier may deteriorate. Therefore, we applied the under-sampling technique [31]. The number of coincidence events for each source position was under-sampled to the minimum number of coincidence events between each source position. In the results, the training and testing datasets had 6,666,375 and 39,625 coincidence events, respectively.
To monitor the training, we considered 90% of the training dataset as an actual training dataset and 10% of the training dataset as the validation dataset.

Network Architecture
We propose the multilayer perceptron (MLP) architecture illustrated in Figure 2 for the direct classification of the annihilation position from the input data of paired Cherenkov detectors. The proposed MLP consists of an input layer, three hidden layers, and an output layer. The input layer receives the xy position and time information of each photon measured by Cherenkov detectors 1 and 2, respectively. In this study, as we assumed a photon detection threshold of five for each Cherenkov detector, the input layer has thirty units, as shown in Figure 2. The input units are affine transformed to a hidden layer with 256 units. Each unit of the hidden layer is normalized by batch-normalization [32] and activated by a rectified linear unit (ReLU) [33]. A similar process is repeated for each hidden layer. Finally, the final hidden layer is affine transformed to the output layer. The output units are activated by the soft-max function. We used cross-entropy as a loss function.

Classification Accuracy
The classification accuracies defined below were measured for each SPTR and readout pitch. Training was performed on a computer running Ubuntu 16.04 with a graphical processing unit (NVIDIA GeForce GTX 1080 with 8 GB of memory) using a Chainer 6.0.0 [34]. Each variable of input data was standardized with zero mean and unit variance by a standard-scaler function of the sci-kit learn library [35]. We performed 100 epochs of training using the Adam method [36] and weight decay of 10 −4 as regularization.

Classification Accuracy
The classification accuracies defined below were measured for each SPTR and readout pitch.
where N test is the number of coincidence events in the test dataset and label true is the true label. In this study, N test was 39,625.

Confusion Matrix
To visualize how MLP performed misclassification, we summarized all predictions for the test dataset in the confusion matrix [37] defined below.
ConfusionMatrix label true ,label = event(label) wherelabel is a predicted label. The confusion matrix represents the number of events label true predicted aslabel. If all events were classified correctly, they were collected in the diagonal element of the confusion matrix. As we have 125 labels consisting of 5 (x) × 5 (y) × 5 (z), we created a 5 × 5 confusion matrix for each direction x, y, and z.

Point Spread Function
To visualize the PSF, we estimated the continuous-valued annihilation position as the center of gravity of the probability output by the MLP.
where (x,ŷ,ẑ) is the estimated continuous-valued annihilation position and Position s is the predicted probability that the coincidence event came from the source position (x s , y s , z s ). As a measure of spatial resolution, the root mean square errors (RMSE) of the x-, y-, and z-position estimation were calculated as those in Tabacchini et al. [38].

Detection Efficiency
To evaluate the system sensitivity, the γ-ray pair detection efficiency defined below was calculated.
where τ is the photon detection threshold and N emission is the number of γ-ray pair emissions. In this study, N emission was 1.25 × 10 9 (125 × 10 7 ). Figure 3 shows the training and validation loss curves at an SPTR of 0 ps and a readout pitch of 0 mm, and an SPTR of 10 ps and readout pitch of 3 mm, respectively. The differences between training and validation loss were small, so there was no overfitting of the MLP for each condition.

Detection Efficiency
To evaluate the system sensitivity, the γ-ray pair detection efficiency defined below was calculated.
where τ is the photon detection threshold and Nemission is the number of γ-ray pair emissions. In this study, Nemission was 1.25 × 10 9 (125 × 10 7 ). Figure 3 shows the training and validation loss curves at an SPTR of 0 ps and a readout pitch of 0 mm, and an SPTR of 10 ps and readout pitch of 3 mm, respectively. The differences between training and validation loss were small, so there was no overfitting of the MLP for each condition.  Figure 4 shows the classification accuracy relative to the SPTR and readout pitch. Table 2 shows the classification accuracy for the representative conditions.  Figure 4 shows the classification accuracy relative to the SPTR and readout pitch. Table 2 shows the classification accuracy for the representative conditions.      Figure 5 shows the confusion matrix in the xand z-directions. Figure 5a,b show the results at an SPTR of 10 ps and 100 ps, respectively. The results in the y-direction were omitted because of symmetry. The readout pitch was fixed at 3.0 mm. As the number of events in the test dataset was 39,625, the maximum number of events in diagonal elements was considered to be 7925. In the confusion matrix in the z-direction at an SPTR of 100 ps, the diagonal elements had expanded to nondiagonal elements. . Confusion matrix in the x-and z-directions at an SPTR of (a) 10 ps, and (b) 100 ps, respectively. Results in the y-direction were omitted because these were almost same as those in the x-direction. Readout pitch was fixed at 3 mm. As the number of events in the test dataset was 39,625, the maximum number of events in diagonal elements is considered to be 7925. Figure 6a,b show the PSF as an error distribution of the estimated continuous-valued annihilation position at an SPTR of 10 ps and 100 ps, respectively. The readout pitch was fixed at 3 mm. Figure 6c,d show the projection of the PSF on the x-and z-axis, respectively. Results on the y- Figure 5. Confusion matrix in the xand z-directions at an SPTR of (a) 10 ps, and (b) 100 ps, respectively. Results in the y-direction were omitted because these were almost same as those in the x-direction. Readout pitch was fixed at 3 mm. As the number of events in the test dataset was 39,625, the maximum number of events in diagonal elements is considered to be 7925. Figure 6a,b show the PSF as an error distribution of the estimated continuous-valued annihilation position at an SPTR of 10 ps and 100 ps, respectively. The readout pitch was fixed at 3 mm. Figure 6c,d show the projection of the PSF on the xand z-axis, respectively. Results on the y-axis were omitted because they were very similar to those on the x-axis. There were side peaks at ±5 mm, as shown in Figure 6c. The PSF was blurred especially along the z-direction at an SPTR of 100 ps, as shown in Figure 6c. Results on the y-axis were omitted because they were very similar to those on the x-axis. Readout pitch was fixed at 3 mm. Figure 7a,b show the RMSE of the x-and z-position estimations, respectively, relative to the SPTR and readout pitch. The results of the y-position estimation were omitted because they were very similar to those of the x-position estimation. As shown in Figure 7a, the RMSE of the x-position estimation was almost the same for a readout pitch of 0 to 3 mm. As shown in Figure 7b, the RMSE of the z-position estimation was almost the same for all readout pitches. Table 3 shows the RMSE under representative conditions. Results on the y-axis were omitted because they were very similar to those on the x-axis. Readout pitch was fixed at 3 mm. Figure 7a,b show the RMSE of the x-and z-position estimations, respectively, relative to the SPTR and readout pitch. The results of the y-position estimation were omitted because they were very similar to those of the x-position estimation. As shown in Figure 7a, the RMSE of the x-position estimation was almost the same for a readout pitch of 0 to 3 mm. As shown in Figure 7b, the RMSE of the z-position estimation was almost the same for all readout pitches. Table 3 Figure 8 shows the detection efficiency for the γ-ray pair emission by the Cherenkov detector pair used in this study. For a photon detection threshold of 1, the detection efficiency was 4.3%.   Figure 8 shows the detection efficiency for the γ-ray pair emission by the Cherenkov detector pair used in this study. For a photon detection threshold of 1, the detection efficiency was 4.3%.

Discussion
We validated the feasibility of direct annihilation position classification based on DNNs from photon information measured by paired Cherenkov detectors. We used Monte Carlo simulation data of 125 point sources for validation. In addition, we evaluated the influence of the SPTR and readout pitch on the classification accuracy and spatial resolution.

Discussion
We validated the feasibility of direct annihilation position classification based on DNNs from photon information measured by paired Cherenkov detectors. We used Monte Carlo simulation data of 125 point sources for validation. In addition, we evaluated the influence of the SPTR and readout pitch on the classification accuracy and spatial resolution.
The classification accuracy was 96% at ideal conditions where the SPTR was 0 ps and the readout pitch was 0 mm (Table 2). This indicates that the proposed method is feasible under ideal conditions. As the classification accuracies of the readout pitch of 0 to 3 mm were almost the same (Figure 4), a readout pitch of 3 mm was considered to be adequate. In contrast, the classification accuracies dropped rapidly as the SPTR worsened. To maintain a classification accuracy of 80%, the SPTR must be 10 ps or less (Figure 4). In addition, we evaluated the confusion matrix to determine whether the MLP made major mistakes, because classification accuracy as the figure of merit does not distinguish between major and minor mistakes. When the SPTR was 10 ps and the readout pitch was 3 mm, misclassification was concentrated on the adjacent positions (Figure 5a). This can be interpreted as evidence that the MLP rarely made major mistakes in these conditions. In contrast, as the SPTR worsened to 100 ps, the misclassification was broadly distributed in the z-direction (Figure 5b). These results indicate that an SPTR of 10 ps or less is preferable in view of the magnitude of misclassification.
To better understand the effects of the SPTR and readout pitch, we visualized the PSF by estimating the continuous-valued annihilation position as the center of gravity of the probability output by the MLP (Figure 6). When the SPTR was 10 ps and the readout pitch was 3 mm, the PSF was isotropic (Figure 6a). The magnitudes of misclassification in the x-, y-,and z-directions were almost equal and concentrated in adjacent positions. In contrast, as the SPTR worsened to 100 ps, the PSF was elongated in a bar shape along the z-direction (Figure 6b). This can be interpreted as an indication that the SPTR has a greater effect in the z-direction than in the xand y-directions. The projections of the PSF on the xand z-axis show that the shape of the PSF was sharp and had side peaks (Figure 6c). This means that Gaussian fitting is not adequate for analysis of the spatial resolution. Hence, we calculated the RMSE to determine spatial resolution as in Tabacchini et al. [38]. The RMSE of the x-position estimation was slightly affected by the readout pitch (Figure 7a). This was consistent with the results of 3-D interaction position estimation based on deep learning using the Cherenkov detector [14]. The RMSE of the z-position estimation was almost the same for all readout pitches and rapidly degraded as the SPTR worsened (Figure 7b). These results may indicate that the z-position was estimated using only time information. When the SPTR was 10 ps and the readout pitch was 3 mm, the proposed method obtained a σ x of 1.12 mm and σ z of 1.42 mm ( Table 3).
The sensitivity of the proposed system was low ( Figure 8). As we focused on whether direct annihilation position classification is possible in this study, we did not optimize the system to increase the sensitivity. In future studies, we aim to optimize the system parameters, for example, the distance between detectors, detector size, thickness, and surface treatment of the crystal, and the number of paired detectors, to improve the resolution and sensitivity. For example, Consuegra et al. optimized the thickness and surface treatment of PbF2 crystals of whole-body PbF2 TOF-PET scanners, using a figure of merit which was defined as the ratio between detection efficiency and CTR [39]. In addition, the photon detection threshold can be optimized to control the tradeoff between resolution and sensitivity. For example, when the SPTR was 0 ps and readout pitch was 0 mm, the proposed method obtained classification accuracies of 95% and 92% at photon detection thresholds of four and three, respectively (data not shown).
The limitation of this study is that the validation only used the simulation data of point sources. In future studies, we plan to experimentally validate the simulation results. For the experimental setup, a pair of Cherenkov detectors consisting of a position sensitive photodetector and a Cherenkov radiator would be used. In parallel, we will simulate a phantom to evaluate the image quality of the proposed system.

Conclusions
This study demonstrated the feasibility of direct annihilation position classification based on deep learning using paired Cherenkov detectors. In the Monte Carlo simulation, the proposed method obtained a classification accuracy of 80% and a spatial resolution of less than 1.5 mm (RMSE) when the SPTR was 10 ps and the readout pitch was 3 mm. As the classification accuracies for readout pitches of 0 to 3 mm were almost equal, a readout pitch of 3 mm could be sufficient. Furthermore, because the RMSE of the z-position estimation was almost equal for readout pitches from 0 to 10 mm, it may be that the z-position could be estimated from only time information. In future studies, we will optimize the system parameters to increase the sensitivity.