## 1. Introduction

Radiation sources should be handled carefully and controlled strictly. However, in the events of theft and loss of sources or undesired acts of terrorism using such sources, it is necessary to identify multiple sources over a wide search area rapidly [

1]. Several methods for radiation source identification have been developed [

2,

3,

4,

5,

6,

7]. For example, Huo et al. reported a method to estimate the location and intensity of radiation sources by using a mobile robot equipped with a Geiger–Müller (GM) counter and laser range sensor [

4]. They investigated the selection of the measurement position by reinforcement learning for the autonomous identification of the radiation sources. Besides this, methods to visualize the gamma-ray intensity in an environmental three-dimensional map were developed. Vetter et al. demonstrated radiation source detection using both simultaneous localization and mapping (SLAM) based on a light detection and ranging (LiDAR) system and gamma-ray images obtained using gamma imaging methods such as coded-aperture and Compton imaging [

5]. Sato et al. applied this method to visualize the gamma-ray intensity at the site of Fukushima Daiichi Nuclear Power Plant [

6,

7].

The typical field of view of conventional gamma imaging is less than 180°, whereas 4π Compton imaging is sensitive to gamma rays incident on a detector from all directions. Hence, 4π Compton imaging can allow more rapid identification of radiation sources than that possible by methods based on conventional gamma imaging. In this regard, we developed a 4π gamma-ray imaging system using gadolinium aluminum gallium garnet (GAGG) scintillators [

8,

9] and CdTe detectors [

10,

11]. Previous studies demonstrated that the location and activity of hidden gamma-ray sources can be estimated by combining gamma-ray images measured at multiple positions [

11,

12]. In autonomous source identification by a 4π Compton imager mounted on a robotic vehicle, it is necessary to optimize the measurement procedure, i.e., the measurement positions around the target sources. For this purpose, we proposed a detector movement algorithm for a single source [

13,

14]. In this study, we developed a path-planning system for radioisotope identification devices by using 4π gamma imaging based on random forest (RF) analysis.

## 2. Investigation of Path-Planning System Using an Integrated Simulation Model

In the source identification method based on 4π gamma imaging [

12], a point source is assumed to exist in a certain voxel in a three-dimensional (3D) voxel space, and the source intensity at the pixel in the direction estimated from the gamma image is calculated from the intensity of the gamma image. This calculation is performed for 4π gamma images obtained at several positions around a source, and finally, the source is identified as being present at that intensity at a position for which the results are consistent. For rapidly identifying radiation sources using 4π gamma imaging, the images should be obtained from multiple positions that are suitable for obtaining the intensity and position information of the sources. In a previous study, we developed a detector movement algorithm for a single radiation source [

13]. The first priority in the algorithm is that the detector is moved to the direction with the highest intensity in the 4π gamma image, and the second one is that the detector is moved away from the direction with the highest intensity in the image and toward the direction with the next-higher intensity in the image.

In this study, we investigated a path-planning system for detector movement. An integrated simulation model [

13] that estimates the location and intensity of a single gamma source from gamma images at arbitrary positions around the source was used to develop the path-planning system. To obtain gamma images in the simulation, a 3D multipixel array CdTe detector was assumed as a 4π gamma imager. The basic detector response with sufficiently small counting statistics was obtained by measuring

^{137}Cs (2 MBq) placed at 100 cm from the center of the detector for 20 min. For any measurement point in the simulation, the gamma image was calculated by rotating the basic response to the direction of a target source and transforming the intensity of the basic response to follow the inverse square law. Therefore, the background variation and uncertainty caused by the counting statistics in calculated gamma images were not considered in the following discussion.

To extract appropriate features in RF analysis even when there are two sources and create a prediction model to estimate the probability of identification at the next measurement position, simulations were performed for two sources.

Figure 1 shows the locations of two

^{137}Cs point sources and the possible measurement positions around the sources on the search area in the integrated simulation model. The point sources and the possible measurement positions were assumed to be on the same plane. The search area was 8 ≤ X ≤ 8 m and −8 ≤ Y ≤ 0 m. Measurement points A and B were selected from S0 to S44 positions on a 2 m grid in the search area, excluding the two source positions S20 and S24. The intensity ratio of the two sources ranged from 0.1 to 4.9.

Simulation of source identification in a 3D voxel space (41 × 41 × 41 voxel, 0.4 m

^{3}/voxel) was performed under all possible conditions of source intensity ratio and for two measurement positions A and B. The output data were analyzed using RF, a machine learning model. First, to find the features in this RF analysis, the features in the decision tree analysis that were examined in our previous study [

13] were selected as the candidates.

Figure 2 shows the definitions of the eight candidate features listed below:

where A and B are the first and second measurement points, respectively, C is the midpoint of line segment AB, G

_{est} is the point closer to point C between the weighted centers of the estimated areas of sources #1 and #2, and G

_{pos} is the weighted center of the three points A, B, and G

_{est}.

Assuming that the source is identified when G_{est} is estimated within ±1 m of the true source location and the estimated source intensity is estimated within ±75% of the true source intensity, the objective function in the RF analysis was set as “detected” (i.e., “1”) or “not detected” (i.e., “0”).

Highly correlated variables, i.e., strong multicollinearity, should be avoided for achieving better accuracy and feature selection in the RF analysis. The variance inflation factor (

VIF) of the

jth variable

X_{j}, defined by Equation (1), represents the degree to which one variable is related to other variables, and multicollinearity is suspected if

VIF is greater than 10.

with

Here,

${R}_{j}^{2}$ is the coefficient of determination of the regression equation with

X_{j} on all other remaining variables,

${f}_{ji}$ is the ordinary least square regression of

X_{j} on the

ith data,

${X}_{ji}$ is the

X_{j} value of the

ith data,

$\overline{{X}_{j}}$ is the mean of

X_{j}, and

n is the number of data obtained by the simulation. Among the eight variables,

$\left|\overrightarrow{{\mathrm{CG}}_{\mathrm{est}}}\right|$,

$\left|\overrightarrow{{\mathrm{BG}}_{\mathrm{est}}}\right|$, and

$\left|\overrightarrow{{\mathrm{G}}_{\mathrm{pos}}{\mathrm{G}}_{\mathrm{est}}}\right|$ had very large calculated VIFs; hence, these three variables were removed. The results of recalculation of the VIFs for five variables are listed in

Table 1. The

VIF of each feature became smaller (weaker correlation). Since G

_{est} is derived from the estimated source identification result, its uncertainty is likely to be larger than the uncertainties of A and B, which can be measured. Therefore, we selected

$\left|\overrightarrow{\mathrm{AB}}\right|$ and

$\left|\overrightarrow{\mathrm{CB}}\xb7\overrightarrow{{\mathrm{CG}}_{\mathrm{est}}}\right|$ as the features, and calculated the VIFs using each of

$\left|\overrightarrow{{\mathrm{AG}}_{\mathrm{est}}}\right|$, $\frac{\left|\overrightarrow{{\mathrm{AG}}_{\mathrm{est}}}\right|}{\left|\overrightarrow{{\mathrm{BG}}_{\mathrm{est}}}\right|}$, and ∠AG

_{est}B as an additional variable. The VIF was the lowest when

$\frac{\left|\overrightarrow{{\mathrm{AG}}_{\mathrm{est}}}\right|}{\left|\overrightarrow{{\mathrm{BG}}_{\mathrm{est}}}\right|}$ was included as a feature (see

Table 2).

Parameters in the RF analysis were tuned based on grid search and cross-validation. In this analysis, the number of trees was fixed at 50, and the optimal combinations of two parameters, namely, tree depth and the minimum number of data for nodes, were searched for with tree depth set to 1, 2, 3, and 4, and the minimum number of data for nodes set to 1, 3, 5, 7, and 10. After tuning, a model was created with the tree depth and minimum number of data in a node set to 4 and 10, respectively. Finally, a prediction model accuracy of 86% was built. The importance of the features of the extracted decision trees is summarized in

Table 3.

To understand the created prediction model, we extracted typical decision trees with feature importance similar to those of the whole model with 50 decision trees. The output of RF was determined via ensemble learning using multiple decision trees. Here, the python implementation of the CART (classification and regression trees) algorithm for decision trees was used. The model was constructed by recursively partitioning the training data through hierarchical conditional branching. The extracted decision trees are shown in

Figure 3. The pie charts in this figure show the percentages of data for which the source was found at the node, and

n represents the number of data in each node.

On the left node (

$\left|\overrightarrow{\mathrm{CB}}\xb7\overrightarrow{{\mathrm{CG}}_{\mathrm{est}}}\right|\le 4.0$) in

Figure 3, if

$\frac{\left|\overrightarrow{{\mathrm{AG}}_{\mathrm{est}}}\right|}{\left|\overrightarrow{{\mathrm{BG}}_{\mathrm{est}}}\right|}\le 0.79\text{}\mathrm{and}\left|\overrightarrow{\mathrm{CB}}\xb7\overrightarrow{{\mathrm{CG}}_{\mathrm{est}}}\right|\le 3.1$, the possibility of identifying the source is high. This indicates that both the measurement points where the contribution from one source is larger than that from the other and considering the parallax for the source are preferred for selection. This means it is better to find the sources one by one. On the right node (

$\left|\overrightarrow{\mathrm{CB}}\xb7\overrightarrow{{\mathrm{CG}}_{\mathrm{est}}}\right|>4.0$), the possibility is high for

$\left|\overrightarrow{\mathrm{CB}}\xb7\overrightarrow{{\mathrm{CG}}_{\mathrm{est}}}\right|>34,\left|\overrightarrow{\mathrm{AB}}\right|9.4$. This is the case when one measurement position is close to the source and the other position is far from the source considering the parallax to the source. Therefore, the measurement positions suggested by the prediction model are also consistent with ones preferred as per the detector movement algorithm for a single source, as discussed in our previous study [

13,

14].

We used this prediction model to let the path-planning system decide the next measurement position.

Figure 4 shows the flowchart of the proposed path-planning approach to move the detector for identifying radiation sources. After measurement at a certain measurement point A, the location and intensity of the radiation source(s) are estimated according to the identification principle. If the source is not identified with a sufficiently small uncertainty, the path-planning system selects the next measurement point B from eight candidate positions around A. The path-planning system employs the SLAM results to determine whether the detector can move to one of the eight candidate positions. If possible, the probability of identification is estimated by the prediction model using the input features for all the candidates, and the candidate with the highest probability is selected. Then, the detector is moved, and the 4π gamma image is measured. This process is repeated until the source is identified. When measurement locations with the same detection probability were identified by the path-planning system, the next measurement position was selected based on the detector movement algorithm reported in our previous study [

13].