# Neural Network Approaches for Mobile Spectroscopic Gamma-Ray Source Detection

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Introduce the use of ANN-based spectral anomaly detection and show improvements over simpler linear models.
- Evaluate current state-of-the-art identification networks under operationally relevant conditions, and benchmark against a non-ANN method.
- Improve upon state-of-the-art identification networks by introducing the use of recurrent neural networks.
- Provide a comprehensive description of neural networks for detection and identification that, when accompanied with quantitative results, better inform practitioners of current tradeoffs.

## 2. Methods

#### 2.1. Artificial Neural Networks

#### 2.2. Spectral Anomaly Detection using Autoencoders

^{40}K at 1460 keV) and the associated downscattering continuum, seen in Figure 1.

#### 2.3. Source Identification

#### 2.4. Performance Evaluation and Data

^{198}Au,

^{133}Ba,

^{82}Br,

^{57}Co,

^{60}Co,

^{137}Cs,

^{152}Eu,

^{123}I,

^{131}I,

^{111}In,

^{192}Ir,

^{54}Mn,

^{124}Sb,

^{46}Sc,

^{75}Se,

^{113}Sn, and

^{201}Tl. Figure 4 gives examples of spectra from low-activity

^{60}Co and

^{137}Cs sources compared to randomly-sampled background spectra. Note that the injected sources are simulated independently in vacuum, meaning the effects of environmental scattering or occlusions are not contained in the resulting spectra. In modeling the kinematics of the detector past the source, a vehicle speed of $v=5$ m/s, along a straight line, and a standoff distance ${r}_{0}=10$ m are used. According to information provided as part of the competition, the detector speed used in generating a given background run was a constant value ranging between 1 m/s and 13.1 m/s. However, the detector speed for each run was not provided as part of the original data competition, meaning the speed v used in modeling the source kinematics is not necessarily the same. While not ideal, this discrepancy is not believed to affect the conclusions drawn from these analyses, as the speed used here (5 m/s) is in the range of values used to produce the background data.

#### 2.5. Model Optimization

#### 2.5.1. Training, Validation, and Early Stopping

#### 2.5.2. Data Preprocessing and Batch Normalization

#### 2.5.3. Optimizer and Regularization

#### 2.5.4. Hyperparameter Optimization

#### 2.6. Benchmarking

## 3. Results

#### 3.1. Anomaly Detection

#### 3.2. Identification

## 4. Conclusions

^{137}Cs source corresponds to an excess of counts associated with the

^{137}Cs template, which an operator can interpret and act on. However, it is not clear how identification networks could be interpreted, due to the number of interconnected parameters involved in making a decision—the operator must simply trust that the network is behaving correctly. Additional research is needed in the area of interpretability of spectral models, for example, understanding convolutional kernels as with ref. [13], or generating saliency maps which relate the most significant input features in determining a given network output.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Basic Neural Network Elements

#### Appendix A.1. Fully-Connected Layers

#### Appendix A.2. Convolutional Layers

## Appendix B. Source Data Preparation

## References

- Fagan, D.K.; Robinson, S.M.; Runkle, R.C. Statistical methods applied to gamma-ray spectroscopy algorithms in nuclear security missions. Appl. Radiat. Isot.
**2012**, 70, 2428–2439. [Google Scholar] [CrossRef] - Olmos, P.; Diaz, J.C.; Perez, J.M.; Gomez, P.; Rodellar, V.; Aguayo, P.; Bru, A.; Garcia-Belmonte, G.; de Pablos, J.L. A New Approach to Automatic Radiation Spectrum Analysis. IEEE Trans. Nucl. Sci.
**1991**, 38, 971–975. [Google Scholar] [CrossRef] - Kamuda, M.; Stinnett, J.; Sullivan, C.J. Automated Isotope Identification Algorithm Using Artificial Neural Networks. IEEE Trans. Nucl. Sci.
**2017**, 64, 1858–1864. [Google Scholar] [CrossRef][Green Version] - Cosofret, B.R.; Shokhirev, K.; Mulhall, P.; Payne, D.; Harris, B. Utilization of advanced clutter suppression algorithms for improved standoff detection and identification of radionuclide threats. In Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XV; International Society for Optics and Photonics, SPIE: Baltimore, MD, USA, 2014; Volume 9073, pp. 253–265. [Google Scholar] [CrossRef]
- Joshi, T.H.; Cooper, R.J.; Curtis, J.; Bandstra, M.; Cosofret, B.R.; Shokhirev, K.; Konno, D. A Comparison of the Detection Sensitivity of the Poisson Clutter Split and Region of Interest Algorithms on the RadMAP Mobile System. IEEE Trans. Nucl. Sci.
**2016**, 63, 1218–1226. [Google Scholar] [CrossRef] - Bilton, K.J.; Joshi, T.H.; Bandstra, M.S.; Curtis, J.C.; Quiter, B.J.; Cooper, R.J.; Vetter, K. Non-negative Matrix Factorization of Gamma-Ray Spectra for Background Modeling, Detection, and Source Identification. IEEE Trans. Nucl. Sci.
**2019**, 66, 827–837. [Google Scholar] [CrossRef] - Olmos, P.; Diaz, J.C.; Perez, J.M.; Aguayo, P.; Gomez, P.; Rodellar, V. Drift problems in the automatic analysis of gamma-ray spectra using associative memory algorithms. IEEE Trans. Nucl. Sci.
**1994**, 41, 637–641. [Google Scholar] [CrossRef] - Pilato, V.; Tola, F.; Martinez, J.; Huver, M. Application of neural networks to quantitative spectrometry analysis. Nucl. Instrum. Methods Phys. Res. Sect. A
**1999**, 422, 423–427. [Google Scholar] [CrossRef] - Yoshida, E.; Shizuma, K.; Endo, S.; Oka, T. Application of neural networks for the analysis of gamma-ray spectra measured with a Ge spectrometer. Nucl. Instrum. Methods Phys. Res. Sect. A
**2002**, 484, 557–563. [Google Scholar] [CrossRef] - Chen, L.; Wei, Y.X. Nuclide identification algorithm based on K-L transform and neural networks. Nucl. Instrum. Methods Phys. Res. Sect. A
**2009**, 598, 450–453. [Google Scholar] [CrossRef] - Kim, J.; Lim, K.T.; Kim, J.; Kim, C.J.; Jeon, B.; Park, K.; Kim, G.; Kim, H.; Cho, G. Quantitative analysis of NaI(Tl) gamma-ray spectrometry using an artificial neural network. Nucl. Instrum. Methods Phys. Res. Sect. A
**2019**, 944, 162549. [Google Scholar] [CrossRef] - Kamuda, M.; Zhao, J.; Huff, K. A comparison of machine learning methods for automated gamma-ray spectroscopy. Nucl. Instrum. Methods Phys. Res. Sect. A
**2020**, 954, 161385. [Google Scholar] [CrossRef] - Daniel, G.; Ceraudo, F.; Limousin, O.; Maier, D.; Meuris, A. Automatic and Real-Time Identification of Radionuclides in Gamma-Ray Spectra: A New Method Based on Convolutional Neural Network Trained With Synthetic Data Set. IEEE Trans. Nucl. Sci.
**2020**, 67, 644–653. [Google Scholar] [CrossRef] - Moore, E.T.; Ford, W.P.; Hague, E.J.; Turk, J. An Application of CNNs to Time Sequenced One Dimensional Data in Radiation Detection. arXiv
**2019**, arXiv:1908.10887. [Google Scholar] - Moore, E.T.; Turk, J.L.; Ford, W.P.; Hoteling, N.J.; McLean, L.S. Transfer Learning in Automated Gamma Spectral Identification. arXiv
**2020**, arXiv:2003.10524. [Google Scholar] - Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst.
**2012**, 25, 1097–1105. [Google Scholar] [CrossRef] - Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations; Rumelhart, D.E., Mcclelland, J.L., Eds.; MIT Press: Cambridge, MA, USA, 1986; pp. 318–362. [Google Scholar]
- Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. Ser. A (General)
**1972**, 135, 370–384. [Google Scholar] [CrossRef] - Elman, J.L. Finding Structure in Time. Cogn. Sci.
**1990**, 14, 179–211. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] - Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv
**2014**, arXiv:1406.1078. [Google Scholar] - Ghawaly, J.M.; Nicholson, A.D.; Peplow, D.E.; Anderson-Cook, C.M.; Myers, K.L.; Archer, D.E.; Willis, M.J.; Quiter, B.J. Data for training and testing radiation detection algorithms in an urban environment. Sci. Data
**2020**, 7, 328. [Google Scholar] [CrossRef] - Nicholson, A.; Peplow, D.; Anderson-Cook, C.; Greulich, C.; Ghawaly, J.; Myers, K.; Archer, D.; Willis, M.; Quiter, B. Data for Training and Testing Radiation Detection Algorithms in an Urban Environment. Sci. Data
**2020**, 7, 328. [Google Scholar] [CrossRef] - Agostinelli, S. Geant4—A simulation toolkit. Nucl. Instrum. Methods Phys. Res. Sect. A
**2003**, 506, 250–303. [Google Scholar] [CrossRef][Green Version] - LeCun, Y.; Bottou, L.; Orr, G.B.; Müller, K.R. Efficient BackProp. In Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop; Springer: Berlin/Heidelberg, Germany, 1998; pp. 9–50. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res.
**2014**, 15, 1929–1958. [Google Scholar] - Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res.
**2012**, 13, 281–305. [Google Scholar] - Tandon, P.; Huggins, P.; Maclachlan, R.; Dubrawski, A.; Nelson, K.; Labov, S. Detection of radioactive sources in urban scenes using Bayesian Aggregation of data from mobile spectrometers. Inf. Syst.
**2016**, 57, 195–206. [Google Scholar] [CrossRef][Green Version] - Weisstein, E.W. Bonferroni Correction. Available online: https://mathworld.wolfram.com/BonferroniCorrection.html (accessed on 20 March 2021).
- Bandstra, M.S.; Joshi, T.H.Y.; Bilton, K.J.; Zoglauer, A.; Quiter, B.J. Modeling Aerial Gamma-Ray Backgrounds Using Non-negative Matrix Factorization. IEEE Trans. Nucl. Sci.
**2020**, 67, 777–790. [Google Scholar] [CrossRef][Green Version] - Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity Checks for Saliency Maps. In NIPS’18, Proceedings of the 32nd International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 9525–9536. [Google Scholar]
- Khan, S.; Rahmani, H.; Shah, S.A.A.; Bennamoun, M.; Medioni, G.; Dickinson, S. A Guide to Convolutional Neural Networks for Computer Vision; Morgan & Claypool Publishers: San Rafael, CA, USA, 2018. [Google Scholar]
- Celik, C.; Peplow, D.E.; Davidson, G.G.; Swinney, M.W. A Directional Detector Response Function for Anisotropic Detectors. Nucl. Sci. Eng.
**2019**, 193, 1355–1370. [Google Scholar] [CrossRef]

**Figure 1.**(

**Left panel**) Diagram showing the dimensionality of features ${\mathbf{h}}^{\left(i\right)}$ at each layer i for an example dense autoencoder architecture with five hidden layers. A 128-bin spectrum is input into the autoencoder, and dense layers are computed by performing nonlinear transformations on each preceding layer. The inverse of each operation is then performed to decode the latent features, resulting in a smoothened spectrum. (

**Right panel**) An input background spectrum $\mathbf{x}$ and corresponding autoencoder reconstruction $\widehat{\mathbf{x}}$ are shown. When trained on background, the autoencoder learns spectral features such as background peaks and the associated downscattering continuum. Both the input and output spectra shown here contain 128 bins with widths that scale with the square root of energy. Note that any apparent deviations between the input and output spectra (e.g., at the 1460 keV peak) are due to low-statistics, as the bins of the measured spectrum $\mathbf{x}$ are discrete random samples of the mean Poisson rate $\widehat{\mathbf{x}}$.

**Figure 2.**Example architecture of a convolutional identification network, in a similar fashion to refs. [12,13]. A 2-dimensional feature map resulting from convolutional operations is flattened into a single feature vector of length 1024, and this is reduced down to the output size of 18 (17 sources, 1 background channel). A max-pooling operation is applied to the features resulting from the convolutional operation, reducing feature size from 128 to 64. In the case of an RNN, the dense layer with size (1, 128) at time t is fed back to combine with the previous layer at time $t+1$. Not shown here is a softmax function that the output is fed into.

**Figure 3.**Counts per second as a function of time for the first 60 s of three randomly-sampled runs of background data, illustrating variability in count rates and temporal signatures.

**Figure 4.**Comparison of two injection spectra. The left pane shows a random background spectrum and a random Poisson sample of a 50-$\mu $Ci

^{60}Co source at a 5 m standoff, each having a 1-s integration time. The right pane shows a different random background spectrum and a Poisson sample of a

^{137}Cs source with the same parameters as the previous. The ratio of source-to-total counts is 0.11 for

^{60}Co and 0.04 for

^{137}Cs. Due to having few counts, these spectra do not contain the familiar peak behavior, and instead show small clusters of counts appearing at the characteristic energies of the sources (i.e., 1173 and 1332 kev for

^{60}Co and 662 keV for

^{137}Cs).

**Figure 5.**Histogram of mean MDA for autoencoders evaluated on the validation set. Each model, 40 models in total, was trained using a random value of the L2 regularization coefficient $\lambda $, and random configuration of number of neurons in the dense layers of the network. This figure shows that despite being trained with different parameters, initial weights, and mini-batches, most were able to yield similar performance. The model corresponding to the lowest mean MDA from this figure is examined further on the test set.

**Figure 6.**Comparison of MDA for the three detection methods across all 17 sources at a 1/8 h${}^{-1}$ FAR. Sources are sorted in ascending order MDA for the baseline NMF method. Each model was evaluated by injecting each source type across activities into each run of the background test set and computing the MDA. The background used, the test set, was separate from the training and validation background, and thus gives a sense of how well each model generalizes to unseen background data. Note that the discrepancy between the PCA-based method and the other two is likely due to the detection metric used for the PCA-based approach, which comes from the literature. The error bars shown were computed by propagating uncertainties of $\mu $ and $\sigma $ from Equation (8), estimated from the least squares fitting routine. Note that there is an overlap between DAE and PCA for

^{111}In and between NMF and PCA for

^{75}Se.

**Figure 7.**Histogram of mean MDA for RNN and feedforward (FF) ID networks evaluated on the validation set. Each model, 40 in total, is trained using a random value of the L2 regularization coefficient $\lambda $, number of kernels in the convolutional layer, and number of neurons in the first dense layer. This distribution shows a general trend of improvement when using recurrent layers. The models corresponding to the lowest mean MDA for both feedforward and recurrent networks are examined further on the test set.

**Figure 8.**Comparison of the three methods evaluated on the test set: NMF-based identification, a feedforward network (FF), and an RNN-based identification method. Sources are sorted in ascending order MDA for the baseline NMF method. A total FAR of approximately 1/8 h${}^{-1}$ across all sources is achieved by setting a threshold for each source individually set based on an effective FAR of 1/(8 × 17) h${}^{-1}$. The RNN is seen to generally provide an improvement over its feedforward counterpart, though there are a few notable examples, such as

^{133}Ba. Note that there is an overlap of points between NMF and FF for

^{60}Co,

^{123}I,

^{131}I, and

^{192}Ir.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bilton, K.J.; Joshi, T.H.Y.; Bandstra, M.S.; Curtis, J.C.; Hellfeld, D.; Vetter, K.
Neural Network Approaches for Mobile Spectroscopic Gamma-Ray Source Detection. *J. Nucl. Eng.* **2021**, *2*, 190-206.
https://doi.org/10.3390/jne2020018

**AMA Style**

Bilton KJ, Joshi THY, Bandstra MS, Curtis JC, Hellfeld D, Vetter K.
Neural Network Approaches for Mobile Spectroscopic Gamma-Ray Source Detection. *Journal of Nuclear Engineering*. 2021; 2(2):190-206.
https://doi.org/10.3390/jne2020018

**Chicago/Turabian Style**

Bilton, Kyle J., Tenzing H. Y. Joshi, Mark S. Bandstra, Joseph C. Curtis, Daniel Hellfeld, and Kai Vetter.
2021. "Neural Network Approaches for Mobile Spectroscopic Gamma-Ray Source Detection" *Journal of Nuclear Engineering* 2, no. 2: 190-206.
https://doi.org/10.3390/jne2020018