1. Introduction
The unauthorized use of nuclear materials poses a significant threat to public security and social stability and requires effective interception at customs and border checkpoints. While detectors such as low-cost Sodium Iodide (NaI) and high-performance alternatives are available, their performance may degrade under conditions of low material concentration or when multiple isotopes are present.
Recent advances in machine learning and deep learning have shown great potential for improving the detection and classification of low-concentration nuclear materials in mixtures [
1,
2,
3,
4]. Nevertheless, several challenges remain. One primary limitation is that machine learning (ML) models typically require large amounts of training data. Fortunately, software simulation tools are available such as Gamma Detector Response and Analysis Software (GADRAS) [
5] and GEometry ANd Tracking (Geant4) [
6,
7,
8,
9] that enable the generation of synthetic training data. However, GADRAS is limited to US government employees and contractors. Moreover, GADRAS is limited to one person per license. Geant4 is powerful, but it has many functionalities that may not be needed in mixture identification. The learning curve required for Geant4 is also huge. Compared to GADRAS, Geant4 does not have a user-friendly user interface. Simpler and license-free mixture generation software will help ordinary researchers working in nuclear power plants and private medical radiation research facilities to experiment with new identification algorithms. Second, the spectral data acquired by detectors typically contain background noise and various interferences [
10], thereby requiring the development of robust algorithms for accurate material classification. Third, the presence of multiple nuclear materials may further complicate the spectral signatures and increase the difficulty of accurate detection. Some spectral unmixing may be needed to accurately classify nuclear materials and estimate their relative count contributions within mixtures [
11,
12].
Traditional radiation spectrum analysis relies on examining specific regions of interest (ROIs) within the gamma ray spectrum [
13,
14,
15,
16]. As mentioned in [
17,
18], a key limitation of these approaches is that their reduced performance when ROIs exhibit significant overlap with large libraries of radioisotopes. Recently, researchers have taken the entire spectrum into account for isotope identification, enhancing the accuracy of detecting and quantifying various isotopes [
19,
20]. A key advantage is that the Compton continuum can be considered, and the entire spectrum is shown to allow for some tolerance to gain shift.
Several conventional methods have been developed for analyzing spectral signatures from material mixtures. Non-negativity Constrained Least Square (NCLS) has been applied to chemical agent detection [
21]. Partial Least Square (PLS) has been utilized for rock composition analysis in Laser-Induced Breakdown Spectroscopy (LIBS) [
22]. The Deep Belief Network (DBN) has been applied to hyperspectral image classification tasks [
23]. In addition, linear regression (LR) and random forest regression (RFR) [
24] are also conventional machine learning tools that have been employed for unmixing analysis. A recent doctoral dissertation in 2019 [
18] applied deep learning techniques for isotope classification. However, the study did not address the detection or classification of mixtures.
Deep learning has achieved remarkable progress since the seminal work of Hinton’s group in 2012 [
25]. After that, deep learning has been widely applied across various domains, including target detection and classification [
26,
27], stock market forecasting [
28], land cover classification [
29], image enhancement [
30], and many others [
31].
Although GADRAS can be used to generate realistic synthetic mixture spectra for training ML algorithms, the user interface in GADRAS is not tailored for massive spectra generation. It is tedious, labor intensive, and cumbersome to generate thousands of mixtures. It is, therefore, necessary to develop an efficient and fast mixture generation framework.
In this paper, we modified the single-isotope spectrum data generation framework known as artificial neural networks for spectroscopic analysis (ANNSA) [
18], developed by researchers at the U. of Illinois at Urbana-Champaign, for multi-isotope mixture generation. The data generation framework in [
18] was originally used for single-source spectrum generation at different signal-to-background ratios considering several data augmentation parameters, and it is used for isotope identification. We modified this framework for multi-isotope mixture generation and quantification (mixing ratio estimation) of the isotopes present in these mixtures. One key reason for modifying that framework is to allow us to easily generate thousands of mixtures in our experiments. Generating thousands of mixtures using GADRAS will be too tedious and time consuming because some parameters may need to be entered manually. Moreover, as mentioned earlier, GADRAS is limited to US government employees and contractors. Civilians such as medical researchers may not be able to obtain licenses for GADRAS. The modified data generation framework integrates several augmentation parameters into the spectrum generation process such as integration time, background count rate, signal-to-background ratio, and calibration. Overall, with this modified framework, one can form multi-isotope mixtures with respect to a user-set signal-to-background ratio and several other detector and augmentation parameters such as shielding, shielding density, etc. The output of the framework is the foreground and background spectra. Following that, the framework also generates the measured spectrum (foreground + background) by incorporating a Poisson process which creates a measured mixture spectrum from foreground and background spectra with realistic counting statistics. For three different datasets (homogeneous, slightly heterogeneous, and heterogeneous) investigated, one deep learning-based algorithm demonstrated superior performance compared to other methods by yielding lower root-mean-squared error (RMSE) values.
The main contributions of this work are summarized as follows. First, we propose a novel and efficient framework for rapid generation of mixture spectra. Second, we investigated conventional and deep learning algorithms for relative count contribution estimation for mixtures generated using the proposed fast framework. Third, we applied the proposed framework and various algorithms to uranium enrichment-level prediction.
The remainder of this paper is organized as follows. 
Section 2 summarizes the fast mixture spectra generation framework. 
Section 3 summarizes the investigations of two-mixture mixing ratio estimation results using several ML/DL algorithms. 
Section 4 includes one application of the proposed framework to uranium enrichment-level estimation. Finally, 
Section 5 provides concluding remarks.
  2. Multi-Isotope Spectrum Mixture Generation
  2.1. Background
A related study in [
18] investigated the identification of isotopes in gamma ray spectra. Different from our work, the work in [
18] considered single-isotope detection and does not consider isotope mixtures and quantification. In that work, isotope gamma ray spectra and background spectra were simulated using GADRAS with a custom NaI detector and through variation in detector parameters such as the source–detector distance and detector height, FWHM (full width at half maximum), shielding material, and shielding density. The simulated isotope and background gamma ray spectra were considered as templates. Additionally, in [
18], several augmentation parameters were utilized such that, using these templates and the augmentation parameters, one can create augmented spectra of these single-isotope templates with different background counts per second, signal-to-background rate, integration time, calibration, etc. We modified this single-isotope generation framework and adapted it for multiple-isotope mixture generation at a user-set signal-to-background ratio (SBR). The modified framework thus can be used not only for isotope identification but also for the quantification of isotopes in the mixture. In order to use these foreground and background templates for creating isotope mixtures, we first conducted a few investigations with respect to how GADRAS mixes multiple-isotope spectra (each with its own activity rates) when forming a mixed spectrum of these isotopes.
  2.2. Examining Spectrum Mixing in GADRAS
For this investigation, the following isotope simulations (single isotope at different activities and three-isotope mixture) were conducted in GADRAS with NaI detector (detector height = 56 cm, distance of detector to material = 122 cm, no Poisson noise):
- 137Cs, 9.7687 uCi (single isotope); 
- 223Ra, 790.07 nCi (single isotope); 
- 235U, 977.4 nCi (single isotope); 
- 137Cs, 150 uCi (single isotope); 
- 137Cs, 9.7687 uCi + 223Ra, 790.07 nCi + 235U, 977.4 nCi (three-source mixture). 
By multiplying the “
137Cs, 150 uCi” spectrum with a scalar coefficient of “9.7687/150”, where 150 uCi * (9.7687/150) is equal to 9.7687 uCi, scaling operation is tested by checking whether or not the resultant spectrum is equal to the GADRAS simulation spectrum result for “
137Cs, 9.7687 uCi”. From 
Figure 1, it can be seen that the two spectra (computed (dotted blue line) and simulated (blue line)) are found to be almost the same.
By adding three separate GADRAS-simulated spectra “
137Cs, 9.7687 uCi”, “
223Ra, 790.07 nCi” and “
235U, 977.4 nCi”, the mixing operation is tested by checking whether the computed spectrum is equal to the GADRAS-simulated spectrum for the three-isotope mixture of three different activities “
137Cs, 9.7687 uCi + 
223Ra, 790.07 nCi + 
235U,977.4 nCi”. As can be seen from 
Figure 2, the two spectra are found to be almost the same. This shows that, when forming a mixture of multiple isotopes, GADRAS adds the spectra of the isotopes at their activity units in the mixture, and this indicates a linear mixing phenomenon.
In summary, as we anticipated, if template spectra with various detector parameter variations are generated for individual isotopes for a specific detector, or the isotope and background templates from [
18] are used, simple addition and multiplication operators can be used to simulate multiple-isotope mixtures using these individual source and background templates.
  2.3. New Spectral Data Generation Framework
Because a linear spectral mixing phenomena is observed in GADRAS, when simulating multiple-isotope mixtures, we considered using foreground (source) and background templates when forming multi-isotope mixture training and test datasets. This way, instead of running GADRAS to simulate various parameter variations on the mixture, we simply utilized linear mixing phenomena and used the individual-isotope spectra simulated with GADRAS at various parameter variations. The work in [
18] considered the single-isotope identification problem and only augmented single-isotope spectra with various parameter variations using a framework called ANNSA. We first modified the ANNSA framework in [
18] such that we can generate multiple-isotope mixture spectra with these variations. In the following, we introduce how we conducted this framework modification.
In [
18], isotope identification including mixture form of gamma ray spectra was studied. In that work, the term “relative count contribution” was used rather than mixing ratio or activity for the isotopes that form the mixture. In this work, we are going to use the term “mixing ratio” to refer to relative count contribution. The modified ANNSA framework involves including the background as if it is an isotope in the mixture. For this, the background’s mixing ratio (relative count contribution) in the mixture is assigned with consideration of the user-defined signal-to-background ratio. In the following, we provided a technical write-up that introduces this modification followed by the results.
To introduce the modification steps, we will consider a two-isotope mixture case in which one of the isotopes in the mixture is denoted by 
X and the other isotope is denoted by 
Y. The measured gamma ray spectrum for this two-isotope mixture including background is then denoted by 
Ms. The background spectrum portion in 
Ms is denoted by 
Bs. Suppose 
Xs and 
Ys correspond to the individual spectra for the two isotopes. 
Ms can then be depicted as follows which is decomposed into the background and the two isotopes:
Let 
T denote the total number of counts for 
Ms, and let the mixing ratio (relative count contribution) be denoted as 
Xs, 
Ys, and 
Bs, which are denoted as r
Xs, r
Ys and r
Bs, respectively, where r
Xs + r
Ys + r
Bs = 1. The number of count contribution for 
Xs, 
Ys, and 
Bs can be mathematically expressed as 
T⋅r
Xs, 
T⋅r
Ys, and 
T⋅r
Bs. Suppose the signal-to-background ratio is denoted by SBR. With consideration to count contributions from the source and background, SBR can be mathematically expressed as follows:
Using (2) and considering r
Xs + r
Ys + r
Bs = 1, r
Bs is found to be equal to 1/(SBR + 1), and (r
Xs + r
Ys) is found to be equal to SBR/(SBR + 1). The mixture spectrum, 
Ms, can be written as follows:
        where 
Ms1norm, 
Xsnorm, and 
Ysnorm are the normalized spectra for 
Ms, 
Xs, and 
Ys, respectively. r
Xs and r
Ys can then be randomly selected or manually set such that the sum of them (r
Xs + r
Ys) is equal to 1 − r
Bs.
Considering there are N two-isotope mixture spectra with M channels in the spectrum for a 
K isotopes pool (
Xsnorm, 
Ysnorm, …, 
Zsnorm), the regression problem can be formulated as shown in (4). The modified formulation includes background, 
Bsnorm, as if it is an isotope and also estimates its mixing ratio (relative count contribution).
It should be noted that due to the variation in detector-related parameters (such as source distance, height, shielding, shielding density, etc.) and augmentation parameters, there is not a unique gamma ray signature that can represent an isotope. In the mixture gamma ray spectrum generation phase, the isotope templates are picked from a large source template pool in which these templates are simulated with different detector parameters (source distance, height, shielding, shielding density, etc.) using GADRAS. Similarly, for the background template, B, of a specific mixture, background is also picked from the background template pool.
PLS, LR, RFR, and Deep Regression methods are thus found to be more suitable in a scenario like this since they do not directly require unique isotope signatures but rather only mixture spectra and the corresponding mixing ratios (relative count contribution rates) for the isotopes and background in the mixture. Additionally, the spectrum data are affected by Poisson noise due to the randomness of counting events in the detector during actual spectrum measurements. Poisson noise is a statistical noise with a variance proportional to the event counts. At lower count levels, the noise becomes more noticeable and increases errors in estimating the mixing ratio. However, the spectral data generated using the GADRAS template in this study have adequately accounted for multiple parameter variations. Moreover, the consistency of the training and test datasets was maintained, so the effect of Poisson noise did not change the overall performance trend of the experimental method.
  2.4. Processing Steps in the Framework
With the modified framework, one can generate a multi-isotope mixture with a user-defined signal-to-background ratio. A Poisson process is also included at the end to create a mixture spectrum with realistic counting statistics. The block diagram for multi-source mixture spectrum data generation is shown in 
Figure 3. The block diagram is for a two-source mixture generation; however, the framework can be extended for more than two-isotope mixtures in a similar fashion. The block diagram provides the gamma ray mixture spectrum simulation processing steps. The following processing steps are undertaken:
- Choose templates: Source (foreground) and background templates are chosen from the GADRAS-simulated template libraries. Note that these templates are not publicly available and must be generated using licensed GADRAS under specific simulation settings. 
- Normalize templates: The chosen templates are normalized with respect to the sum of channel counts. 
- Assign mixing ratios: Assign the mixing ratio for background based on the signal-to-background ratio specified by the user. The mixture proportions of the sources and foreground are then either randomly selected or set such that the sum of the assigned mixing ratios for background and sources is equal to 1. 
- Form the source spectrum: Add mixing-ratio-multiplied source templates to form the source spectrum. 
- Rebin: Rebin source and background templates phase using “Calibration” parameters. The calibration parameters are used for rebinning the spectrum data according to a quadratic. The quadratic consists of three parameters. The first parameter is a constant rebinning term, which is also known as offset. The second term is a linear rebinning term, which is also known as gain. The third term is an optional quadratic rebinning term, which is also known as non-linear term. Cubic interpolation method is used to find the spectrum values at the rebinned channels. This processing phase is applied to both source and background templates separately. 
- Apply low-level discriminator (LLD) phase: This process uses the LLD parameter. It basically sets all the spectrum values at and before the set parameter LLD to 0. This process is applied to source and background templates separately. 
- Scale: Scale mixed-source and background spectra with total counts where total count is calculated as the sum of source counts (foreground counts) and background counts as expressed in (5). Background counts and source counts of the computation phase uses “Integration time”, “Background count rate”, and “Signal-to-background ratio” parameters as mathematically described in (6) and (7), respectively, where background_cps corresponds to background counts per second.
           
- Form final measured spectra: This phase adds the source and background spectrum followed by a Poisson process to create a simulated measured spectrum with realistic counting statistics for the mixed sources and corresponding background. 
  
    
  
  
    Figure 3.
      Block diagram of the multi-isotope mixture generation framework.
  
 
   Figure 3.
      Block diagram of the multi-isotope mixture generation framework.
  
 
  2.5. Detector and Augmentation Parameters
NaI detector was used when forming the source and background templates in [
18]. We used these templates in [
18] to form mixed-spectrum training and test datasets in our modified framework. It is worth mentioning that it is also possible to use other detectors and form source and background templates accordingly. The detector described in [
18] simulates the Ortec 905-3 2x2-in NaI (Tl) detector, which is incorporated in the Algorithm Improvement Program (AIP) software package developed by the Department of Homeland Security. The GADRAS parameters used to simulate this detector can be seen in 
Figure 4. In [
18], the default energy calibration of the NaI detector model was modified to simulate template spectra, so the detector-measured energies range from 0 MeV to 3.5 MeV. This configuration was realized by assigning zero to the calibration offset (Order 0 in E) and setting the calibration gain (Order 1 in E) to 3500. The default number of channels was changed to 1194. The default spectrum length used was 3 MeV.
The simulated source templates in [
18] correspond to a total of 29 isotopes. The isotopes used in the source template dataset are derived from the ANSI N42-34-2006 standard [
32] for isotope identification devices [
18] and consist of: 
241Am, 
133Ba, 
57Co, 
60Co, 
51Cr, 
137Cs, 
152Eu, 
67Ga, 
123I, 
125I, 
131I, 
111In, 
192Ir, 
177mLu, 
99Mo, 
237Np, 
103Pd, 
239Pu, 
240Pu, 
226Ra, 
75Se, 
153Sm, 99mTc, 
201Tl, 
204Tl, 
233U, 
235U, 
238U, and 
133Xe. The number of source templates in [
18] with all parameter variations is 65,975. These parameter combinations for the NaI detector are listed below.
- Source height (in cm): 100, 125, 150, 50, 75; 
- Source distance (in cm): 112.5, 175, 237.5, 300, 50; 
- Shielding: alum, iron, lead, none; 
- Shielding density (g/cm2): 1.82 (alum), 4.18 (alum), 7.49 (alum), 1.53 (iron), 3.50 (iron), 6.28 (iron), 0.22 (lead), 0.51 (lead), 0.92 (0.51), 0.0 (none); 
- FWHM at 662 keV (%): 6, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0. 
The number of background templates in [
18] is 84 and consists of the following parameters:
- FWHM at 662 keV (%): 6, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0; 
- Location: Albuquerque, Atlanta, Austin, Chicago, Knoxville, Miami; 
- Cosmic: 0, 1 (0 indicating cosmic effect is not included and 1 indicating it is included). 
Other than the detector and background specific parameters that are mentioned above, for multi-isotope mixture simulation, we adapted the same augmentation parameters from [
18]. These augmentation parameters are listed in the following. Among them, the “mixing ratio” parameter is a new one which we included for multi-isotope mixture generation, and the other parameters were used in [
18] when generating augmented single-isotope source spectra. By varying these augmentation parameters in the mixture spectrum generation, diversity in the training and test spectrum datasets can be created.
- Integration time (s); 
- Background count rate (background cps); 
- Signal-to-background ratio; 
- Calibration; 
- Low-level discriminator (LLD) parameter; 
- Mixing ratio for each source and background in the mixture.