Abstract
Generating statistically faithful short-duration gamma-ray spectra from a single long measurement is essential in nuclear safeguards, supporting tasks such as algorithm development and machine-learning applications, especially when list-mode data are unavailable. Existing subsampling methods often distort the statistical characteristics of genuine short-duration measurements, leading to biased or unreliable analytical outcomes and thereby undermining downstream tasks. In this work, we compare five subsampling approaches using a benchmark set of 156 genuine replicate spectra collected with a high-purity germanium detector. We evaluate each method with respect to run-to-run variance, channel-to-channel variance, and preservation of total counts (losslessness). Across a wide range of subsampling ratios, only binomial subsampling without replacement consistently reproduces the statistical properties of genuine short-duration spectra, maintaining proper dispersion even in sparse spectral regions and perfectly preserving total counts. These results provide a mathematically principled and practically validated framework for generating synthetically shortened spectra when true short-duration measurements are unavailable.
1. Introduction
1.1. Background and Motivation
Gamma rays emitted from spontaneous nuclear decay exhibit an energy and intensity spectrum that is characteristic of the decaying nuclear species. These spectra are invaluable for the identification and quantification of radioactive materials and serve as a critical tool across a range of applications. In particular, they are essential in the fields of nuclear safeguards, arms control, and nuclear emergency response, which are primary focus areas for the authors.
A gamma-ray spectrum is typically represented as a histogram that records the number of detected gamma-ray events within discrete energy bins. Common detector materials used to acquire such spectra include Sodium Iodide (NaI), Lanthanum Bromide (LaBr3), Cadmium Zinc Telluride (CZT), and High-Purity Germanium (HPGe). A foundational statistical assumption is that the count in each energy bin follows an independent Poisson distribution, based on the premise that gamma-ray arrivals are random, independent events occurring at a constant average rate. Under this assumption, the statistical conclusions can be generalized across different detector types. For the purposes of this study, we focus exclusively on spectra acquired with HPGe detectors.
In the analysis of gamma-ray spectra, longer counting times are generally desirable to improve statistical precision. When multiple spectra are collected under similar conditions, they can be aggregated to form a composite spectrum with Poisson characteristics equivalent to a single long-duration measurement since the sum of independent Poisson random variables is itself Poisson-distributed. In some situations, however, shorter counting times are preferred or even required. This means subdividing a parent spectrum into shorter duration ‘child’ spectra. There are numerous motivations for generating such subsampled spectra, including: evaluating peak-fitting algorithms in low-count regimes, verifying limit-of-detection (LOD) calculations, testing isotope identification accuracy in Radiation Isotope Identification Devices (RIIDs), generating synthetic source combinations with varying relative intensities, or aligning count times in training datasets. These short-run subsampled spectra, defined to be subsets of the original spectrum where the data acquisition period is shorter than that of the original spectrum, should have statistical properties consistent with what one would expect from genuine shorter-count measurements, both individually and when considered as a set. If the individual gamma-ray events are time-stamped (i.e., recorded in list-mode), this division is straightforward. However, in many practical scenarios, only a long-duration histogram is available, necessitating subsampling techniques to generate synthetic short-duration replicates. In practice, a variety of subsampling algorithms are employed, some of which result in child spectra that are either overdispersed or underdispersed relative to true Poisson behavior.
In this work, we evaluate several candidate subsampling algorithms against a benchmark set of genuine replicate spectra. We identify one method—binomial subsampling without replacement—that satisfies all our statistical criteria and is mathematically well-founded. Our findings and recommendations are presented in the sections that follow.
1.2. Prior Application Examples
Numerous fields can leverage synthetically generated, short-run spectra whenever true list-mode records are unavailable. Examples include the Swift Burst Alert Telescope (BAT) histograms available through the batsurvey package [1] and the XRF dataset of Rosales et al., which reports only energy-channel versus count histograms [2,3], among many others. Our primary interest, however, lies in gamma-ray spectroscopy applications such as those described below.
A U.S. Department of Homeland Security initiative known as the Algorithm Improvement Project (AIP) involved collaboration among several national laboratories, including Los Alamos, Sandia, Brookhaven, and Pacific Northwest National Laboratories [4]. The project worked with the commercial sector to improve the radionuclide identification performance of commercial RIIDs by supplying an extensive library of spectra, including special nuclear material (SNM) examples not readily accessible to commercial entities, and to score performance against consistent criteria. Synthetic child spectra were generated from these datasets using Poisson resampling applied to experimentally measured, deconvolved, regenerated, and smoothed “parent” spectra.
The Gamma Detector Response and Analysis Software (GADRAS) Version 19.5.2, which also played a role in the AIP effort, allows users to generate fully synthetic gamma spectra derived from nuclear data tables and user-specified detector response functions [5,6]. Users may elect to apply Poisson resampling to generate child spectra with proper variance from a synthetic parent, which intrinsically has zero variance. So long as the nuclear data are used appropriately, this is a correct application of Poisson resampling. The fidelity of the resulting spectra depends on the accuracy of the underlying nuclear data, detector response modeling, and radiation transport calculations. More recently, GADRAS has been employed to generate millions of fully synthetic spectra for training machine learning models, including multilayer perceptrons (MLPs), convolutional neural networks (CNNs), Transformers, and long short-term memory networks (LSTMs) [7]. Machine learning tools are particularly sensitive to data quality, necessitating mathematically well-founded subsampling techniques. Another Sandia-developed toolkit, PyRIID, offers functionality for generating synthetic gamma-ray spectra to support radionuclide identification and machine learning studies [8]. PyRIID constructs spectra from template components and allows Poisson noise to be applied to emulate counting statistics. However, like GADRAS, it relies on forward modeling and noise injection rather than subsampling experimentally measured spectra.
Monte Carlo simulation is widely used to generate purely synthetic gamma-ray spectra, but naïve simulated spectra should not be assumed to be interchangeable with experimental measurements. As demonstrated by Kwon et al., standard Geant4 outputs differ substantially from real HPGe spectra in peak heights, high-energy summing regions, and overall spectral shape [9,10]. These discrepancies arise because uncorrected Monte Carlo spectra omit essential detector-physics effects, including dead time, charge-collection dynamics, pulse pile-up, and coincidence summing, that must be modeled explicitly to reproduce experimentally observed behavior. Quantitative comparisons reveal the severity of this gap, which may be greatly reduced through appropriate post-processing, but the result is still not a perfect substitute for real measurement.
Lalor et al. examined the consequences of this simulation–experiment gap for machine learning models used in radioisotope identification. They considered both simulation-to-simulation and simulation-to-experimental domain adaptation scenarios and showed that training a classifier purely on synthetic spectra can lead to substantial misclassification risk when applied to experimental data because of differences in the underlying data distributions, which they describe as the “sim-to-real gap” [7]. Their results indicate that, while purely synthetic spectra are valuable for pretraining, incorporating even limited genuine experimental spectra remains essential for achieving high performance.
The Replicative Assessment of Spectroscopic Equipment (RASE) software Version 2.0, developed for RIID performance validation, utilizes Poisson resampling by default to produce synthetic child spectra from experimentally measured parent spectra [11]. RASE also supports alternative sampling methods, including rejection sampling and inverse transform sampling, which users may select. The motivation for the RASE software and the International Atomic Energy Agency (IAEA)-sponsored study of the same name is discussed in [12]; it is notable that this document specifies selecting random numbers from a “density probability distribution corresponding to the normalized base spectrum,” implying that only sampling with replacement is considered (we believe that sampling without replacement is essential to meet our proposed criteria). A validation study by Flynn et al. [13] demonstrated that the Poisson-resampled spectra were statistically equivalent to experimental replicates, only when the simulated acquisition time did not exceed 4% of the parent spectrum’s acquisition time. Exceeding this threshold was shown to introduce statistically significant deviations similar to what we observe in the present work. The three subsampling methods currently implemented in RASE are suitable for the majority of its application space, but binomial subsampling is under consideration as an additional option.
The Fixed-energy Response-function Analysis with Multiple efficiencies (FRAM) software Version 6.1 [14] includes a lesser-known feature allowing simulated spectra to be generated via Poisson subsampling of a measured parent spectrum. Although this feature is not integral to FRAM’s core isotopic analysis functionality, it provides a capability analogous to RASE. A future release of FRAM is expected to include support for binomial subsampling as an additional option.
Burr et al. explore Poisson sampling to augment genuine experimental spectra with synthetic spectra for RIID algorithm testing, demonstrating the adequacy of the technique for radioisotope detection and discussing the bias-variance tradeoff [15], although sparse statistical regions are barely discussed. Burr & Hamada [16] note that formal validation studies comparing synthetic vs. real spectra are lacking and outline some of the difficulties with synthetic spectra.
1.3. Limitations and Scope of Applicability
Our framework and validation tests assume that each channel of the parent spectrum follows an independent Poisson distribution. This assumption holds for raw count data but not for background-subtracted spectra, which deviate from Poisson statistics and may contain negative channel counts. In such cases, both binomial and Poisson subsampling will fail to pass some or all of our tests. Many RASE workflows, for example, operate on background-subtracted parent spectra and require a different set of tools. Additionally, methods involving smoothing or deconvolution to reduce statistical noise can be appropriate in specific contexts but produce parent spectra that are non-Poisson. Applying binomial subsampling to such data yields underdispersed child spectra and is thus invalid.
In genuine replicate measurements of radioactive sources, spectral changes occur over time due to decay and ingrowth of radionuclides. Here, we restrict consideration to conditions where such time-dependent effects are negligible; modeling or reproducing these temporal behaviors lies beyond the present scope. Spectra can also exhibit time-dependent gain drift (energy calibration shifts). While the genuine replicate spectra used here showed negligible drift, our simulated child spectra will, by construction, exhibit zero gain drift, which is an idealization. Radionuclide decay and detector drift, which may be relevant in longer measurement campaigns, are identified as topics for future investigation. Additionally, our evaluation metrics are based on empirical distributions across child spectra and do not assume independence between synthetic runs; however, downstream users should be aware that because we require sampling without replacement, synthetic children derived from a single parent are not strictly independent replicates.
Finally, we did not investigate the effect of adjacent-channel correlations on our results. Because detectors have finite energy resolution, neighboring channels are not strictly independent, which becomes more problematic when gain drift occurs. A model of adjacent-channel correlations could be built by cross-examining binned spectra with list-mode counterparts; however, list-mode data was unavailable for this study, so we leave it as a future exercise.
2. Materials and Methods
2.1. Notation and Criteria
In this section we present methods to generate synthetic child spectra. The required notation is presented here. Let be the number of counts in channel i of a parent spectrum, with the expected counts per unit of time and the total measurement time. will refer to channel i of a child spectrum. When synthetically generating more than one child spectrum, we will add an additional index j indicating the synthetic run number, such that represents counts in the ith channel of the jth child spectrum. The letter m will denote the total number of synthetic runs, such that for all j.
The choice of spectral subsampling method depends on the statistical and operational requirements. In our case, the method must be unbiased and statistically robust in sparse data regions, as a primary use case is to evaluate the accuracy of algorithms used for limit-of-detection (LOD) estimation and peak area uncertainty analysis under low-count conditions. Furthermore, we require that the method should be unbiased across the entire range of child-to-parent count time ratios. To this end, the following criteria are defined:
- Run-to-Run Variance: For a given channel, the variation observed across multiple child spectra should be consistent with that expected from independent experimental replicates. This criterion aligns with key factor (2) from Flynn et al. [13].
- Channel-to-Channel Variance: Within a single child spectrum, variation between adjacent channels (“fuzziness”) should reflect the statistical behavior expected from a truly Poisson-distributed signal.
- Losslessness: The synthetic children should collectively partition the parent spectrum so that the total number of counts in each channel is preserved. All parent counts must be allocated exactly once, with no duplication or omission. Because reusing parent counts violates this requirement and can introduce undesired dependence among children, methods that sample with replacement are unsuitable for strictly lossless applications. We acknowledge that in some contexts, such as generating large sets of synthetic replicates where strict unbiasedness is not essential, the losslessness requirement may be reasonably relaxed. However, this is not appropriate for our use case.
2.2. Method 1A: Poisson Sampling
Choose
where is estimated from the observed number of counts in the parent spectrum as . This estimate will generally deviate from the true parameter , which has its own variance . This could make the child spectra overdispersed, particularly in sparse regions of the spectrum, but this effect may not be noticeable when is much smaller than .
Raikov’s theorem, which states that if a Poisson random variable admits a decomposition as the sum of independent random variables, the constituents must themselves be Poisson-distributed, is the mathematical motivation for the application of Method 1A [17]. This method appears to be the most popular method to generate synthetic spectra. It is used by multiple software packages as discussed above.
2.3. Method 1B: Variance-Corrected Poisson Sampling
To correct for channel-to-channel overdispersion from method 1A, we introduce a variance correction by finding a transformation that makes Var, as would be expected for a properly dispersed spectrum. Choose
By construction, is the sum of two correlated random variables, Poisson and , so we can compose the variance as
A complete derivation is given in Appendix A.1. After setting (3) equal to , we find the required transformation:
The adjusted time, , will naturally be less than . For example, if the parent spectrum was collected for 1 h but 300 s is desired, then would be 275 s.
While this method should correct for channel-to-channel overdispersion, it may also produce run-to-run underdispersion when generating a set of child spectra. Additionally, the rounding necessary to produce integer output introduces a new source of variance that may be noticeable in low-count regions of the spectrum.
2.4. Method 2: Binomial Sampling
The binomial sampling approach is closely related to a thinning operator that was first introduced by Steutel & van Harn in 1979, who showed that Poisson distributions are closed under thinning and retain their functional form when each event is independently retained with probability p [18]. The operator was later formalized and widely adopted under the name “binomial thinning” in statistics literature where it serves as a fundamental variance-preserving transformation for Poisson count processes [19]. Despite its long history in discrete probability theory, we find no prior applications of such an approach to gamma-ray spectra, nor to the problem of generating statistically faithful short-duration spectra or families of spectra from a single long observation. To formalize the method, choose
For the case of , this can be imagined as flipping a fair coin for each count in the spectrum; if the result is “heads” then the count is assigned to the first of two child spectra and if the result is “tails” then the count is assigned to the second. We can then use to represent the number of heads in channel i and introduce a second random variable to denote the number of tails. In this fashion the run is split into two distinct parts while preserving the expected variance. The technique is mathematically justified, as the marginal distribution of is Poisson with rate parameter (for proof, see Appendix A.2).
Method 2 can be extended to generate more than two child spectra through cascading binomial sampling, in which repeated applications of the method further subdivide the data. This approach is mathematically equivalent to applying the multinomial distribution. To preserve run-to-run variance, Method 2 will be implemented without replacement—that is, each newly generated child spectrum draws from, and correspondingly depletes, the remaining counts in the parent spectrum until no counts remain.
2.5. Method 3A: Inverse Transform Sampling
Method 3A treats the parent spectrum as an empirical estimate of the underlying energy distribution and then draws individual counts for the child spectrum by inverse transform sampling. Let Y be a non-negative random variable with probability mass function
and cumulative distribution function
In our setting, the parent spectrum provides an empirical distribution over channels. Let be the total parent counts in channel i and define
as the empirical probability that a count resides in channel i. The cumulative distribution function determines the channel assignment for each synthetic count. For a desired child live time , we compute the target number of synthetic counts as . Method 3A then proceeds as follows to generate a single child spectrum:
- (i)
- Sample n independent random variables .
- (ii)
- For each , find the channel k such that and increment channel k in the child spectrum.
The method is visualized in Figure 1. On the left, a gamma-ray spectrum collected using a high-purity germanium (HPGe) detector and a Pu calibration standard. On the right, Method 3A is visualized.
Figure 1.
Gamma-ray spectrum acquired using a high-purity germanium (HPGe) detector and a plutonium calibration source (left) and the corresponding cumulative distribution function (CDF) used for inverse-transform subsampling (right). The red ‘X’ marks the value where , indicating that channel 3231 would be incremented for this realization.
Conditional on N, the child counts follow a multinomial distribution with parameters n, the total counts in the synthetic child, , the proportion of those counts in channel i. Under this model, and , so the variance is slightly smaller than that of a Poisson random variable with mean . Also, the rounding step used to define n can bias the total number of counts assigned to each child. Apart from these effects, inverse transform sampling is distribution-agnostic and works directly with the empirical parent histogram.
This method is implemented as a non-default method by Chavez et al. [11] in the RASE software package. It is susceptible to information degradation or amplification in the progeny due to the nature of sampling with replacement.
2.6. Method 3B: Inverse Transform Sampling with Partial Replacement
Method 3B uses the same inverse-transform machinery as Method 3A but modifies the parent spectrum after each synthetic child is generated in order to reduce repeated use of the same parent counts. The goal is to improve losslessness by limiting parent count re-use. To illustrate, let denote the initial parent counts in channel i and the corresponding total counts. To generate the first synthetic child, we define empirical probabilities
and choose
The algorithm for a single child spectrum is:
- (i)
- Sample n independent random variables .
- (ii)
- For each , find the channel k such that , then
- (a)
- increment channel k in the child spectrum;
- (b)
- decrement channel k in the parent spectrum, i.e., set .
- (iii)
- After completing all n allocations, recompute the empirical probabilities and CDF from the updated parent spectrum.
In principle, one could recompute the CDF after each allocation to perfectly enforce sampling without replacement; however, for spectra (like ours) containing on the order of gamma-ray incidences, this would be computationally infeasible. By updating in batches, we produce an approximation intended to retain most of the losslessness benefit without the computational burden.
Because the parent spectrum is depleted as children are generated, the sum of all child spectra remains close to the original parent spectrum, and Method 3B is substantially more lossless than Method 3A. On the other hand, recomputing the CDF in batches introduces a small bias since it is still possible to allocate counts to channels that have recently been depleted in the parent. Nonetheless, this method preserves total counts much more effectively than methods that sample repeatedly from a fixed parent histogram without any adjustment.
2.7. Other Methods
Two other methods have been observed in applied practice that are not considered here: Gaussian subsampling and simple rescaling. Gaussian subsampling involves both rounding and truncation and is clearly unsuitable in a sparse Poisson data regime. Simple rescaling likewise requires rounding to produce integer counts in each channel and would yield a child spectrum that is radically underdispersed as well as an entire generation of identical children.
3. Results and Discussion
The dataset used for testing consists of 156 genuine child spectra, each collected for s (live time approximately s). By summing the counts at each channel, a longer-run spectrum was produced with duration 46,800 s. This spectrum will serve as the parent for synthetic data. By comparing genuine to synthetic replicates, we can deduce whether the synthetic replicates reproduce the desired statistical behavior of the genuine replicates.
As acknowledged above, the total counts may drift over time due to radionuclide decay and ingrowth, leading to small increases or decreases in total spectral counts per measurement, with potentially larger effects on individual peaks. In the present dataset, the change in predicted total counts between the first and last run was negligible relative to our testing criteria , due mostly to ingrowth of Am-241. This drift is ignored in the following investigations.
In the remainder of this section, we will test each subsampling method for three test cases: and . This range probes method performance across both extreme and moderate subsampling scenarios. Synthetic spectra have been produced for each of these test cases, with channel index i ranging from 1 to and run index j ranging from 1 to 2, 1 to 12, or 1 to 156.
3.1. Run-to-Run Variance
To evaluate run-to-run variance, we compute the average normalized variance across all channels for both the synthetic replicates and the genuine replicates, and then compare these values. If the synthetic and genuine averages do not differ significantly, the subsampling method is deemed to have passed the run-to-run variance test.
To formalize this test, we first compute the variance at channel i across all children. Specifically:
We then normalize each by dividing by the average number of counts: . This value should always be close to unity, which allows visualization across a broad range of . Figure 2 shows for the set of 156 genuine replicates. Note that the metric is centered around unity. Also, note the distinctively non-Gaussian behavior in the sparse region where, on average, each child spectrum has one or fewer counts. Hence, we do not expect to be symmetric about unity when averaged over all i.
Figure 2.
Normalized run-to-run variance for 156 replicate spectra. The (left) panel shows all channels where and normalized variance is centered around unity. The (right) panel shows the sparse region where and the distribution of normalized variances is explicitly non-Gaussian. The observed banding is a consequence of low-count Poisson statistics.
Lastly, we calculate the mean of values, across all channels, yielding the average normalized variance. This can be expressed as:
Table 1 presents and associated uncertainty () for the three different cases. Table 2 lists p-values associated with Wilcoxon Signed-Rank Tests conducted comparing for synthetic replicates to that of the genuine replicates. Because the distributions of the normalized variance metrics are skewed and strongly non-Gaussian in low-count regions (Figure 2), we use the Wilcoxon signed-rank test instead of a parametric paired t-test. The Wilcoxon test does not assume normally distributed differences and is more robust than the t-test when the data contain outliers or heavy tails. Other nonparametric alternatives exist, such as the sign test, but the sign test uses only the sign of the paired differences and discards their magnitude, whereas here we care about both the direction and the size of the effect. A significance level of was chosen, and the p-values were adjusted using the Benjamini–Hochberg procedure to control the false discovery rate. While Poisson (naïve), binomial (no replacement) and inverse transform (with replacement) all produce synthetic child spectra where is not significantly different from the genuine spectra, Poisson (variance-corrected) and inverse transform (partial replacement) produce underdispersed and overdispersed children, respectively. These effects are profound when .
Table 1.
computed for all selected scenarios. Uncertainty is presented as twice the standard error of the mean.
Table 2.
Summary of multiple hypothesis tests comparing for genuine replicates to for synthetic replicates. The reported statistic is the adjusted p-value associated with the Wilcoxon Signed-Rank Test. A double asterisk emphasizes test cases where the difference was significant at the level.
To further assess the relationship between progeny size and run-to-run variance, the test set was maximally expanded. Sets of synthetic child spectra were produced for all factors of 156, that is, and 156. A visualization of the relationship is given in Figure 3. Binomial sampling without replacement, naïve Poisson sampling, and inverse transform sampling with replacement all preserve run-to-run variance with comparable success across tested progeny sizes.
Figure 3.
Difference in mean normalized variance between synthetic and genuine spectra as a function of the number of child spectra. Point estimates are associated with confidence intervals, though the intervals are narrow. Binomial sampling without replacement, inverse-transform sampling with replacement, and naïve Poisson sampling preserve run-to-run variance across a broad range of progeny sizes.
The profound failure of inverse transform sampling with partial replacement is surprising. Because the standard inverse transform method succeeds, and because the disparity is significantly greater for smaller progeny sizes, we suspect the nature of partial replacement drives this failure. Consider for a moment the case. The inverse transform algorithm samples the first child spectrum from the parent with replacement. This means that by random chance, some channels in the child will populate with an observed count proportion different than that of the parent. Then, we subtract from each parent channel the counts allocated to the first child which results in a fundamentally different CDF for the second child when compared to the first. As we increase the number of children, the algorithm has an increasing number of opportunities for recovery such that the algorithm performs fine for large progeny sizes but fails catastrophically when the child set is small. Hence, our efforts to design a computationally efficient workaround for the problem of sampling with replacement made us the unwitting progenitors of a heavily biased sampling scheme.
3.2. Channel-to-Channel Variance
Channel-to-channel variance was studied by focusing on a flat section of the spectrum where no peaks are visible and the relationship between slope and energy is negligible. Additionally, we will select a sparse region with few counts per bin. Sparse spectral regions are of particular interest to researchers in the field of gamma-ray spectroscopy. Accurately representing sparse regions is required for algorithmic spectrum evaluation [20], to understand the naturally present gamma-ray background [21], and to characterize properties of the detector itself. Hence, a good subsampling algorithm must produce child spectra that maintain appropriate variance properties even in the sparse regions. Sparse regions amplify even small statistical artifacts, making them ideal for discriminating among subsampling methods. The chosen region is highlighted in Figure 4 (left) and spans channels 13,800 through 14,500.
Figure 4.
Parent HPGe gamma-ray spectrum with the sparse, approximately flat region used for channel-to-channel variance analysis highlighted (left) and counts in a chosen section of this region with the mean shown as a black dashed line and bounds shown as blue dashed lines (right). Sparse regions provide high sensitivity for detecting statistical artifacts introduced by subsampling.
Channel-to-channel variance will be calculated as the between-channel variance for the selected region normalized to the counts. Let l be the index number of the region lower-bound and u be the index of its upper bound such that is the number of channels in the chosen region. Then the normalized channel-to-channel variance is calculated as
While Wilcoxon Signed-Rank tests were a natural way to study run-to-run variance, conducting significance tests on the difference between for synthetic and genuine child spectra is difficult for and because small sample size reduces statistical power. To accommodate these scenarios, we implemented a bootstrapping algorithm with runs. At each iteration, synthetic and genuine child spectra are resampled with replacement. Then, is computed for the two groups and values stored in separate vectors. A difference vector is computed, and bootstrap confidence intervals are obtained using data percentiles to denote lower and upper bounds. A summary of these results is given in Figure 5, which compares the distribution of mean differences between resampled synthetic and genuine data. Median is denoted as a diamond, with the confidence interval presented above and below. The variance correction method appears to correctly preserve channel-to-channel variance for , but fails to perform as well for and even . All methods perform reasonably well for , but only binomial sampling without replacement performs well for . Across all scenarios, only binomial sampling without replacement consistently produces child spectra with channel-to-channel variance indistinguishable from that of the genuine spectra.
Figure 5.
Bootstrapped distributions of mean difference in channel-to-channel variance between synthetic and genuine spectra for , , and . Diamonds denote the median mean difference with error bars representing confidence intervals. Confidence intervals overlapping zero indicate no statistical difference in channel-to-channel variance between the groups. Only binomial sampling without replacement preserves channel-to-channel variance across all tested progeny sizes.
As with run-to-run variance, the bootstrapping procedure was extended to progeny sizes corresponding to and 156. A trend for each method is visualized in Figure 6. All methods approach the correct level of dispersion, but binomial sampling without replacement once again consistently outperforms the other methods across all progeny sizes.
Figure 6.
Difference in mean channel-to-channel variance () between synthetic and genuine spectra across progeny sizes. Error bars denote confidence intervals. Binomial sampling without replacement and variance-corrected Poisson sampling with replacement both appear to preserve channel-to-channel variance in the sparse spectral regions.
While both variance-corrected Poisson and binomial sampling seem to pass the channel-to-channel variance test, variance-corrected Poisson sampling requires a rounding step that introduces bias. In Figure 7, we present count frequency plots for to visualize this bias. Smooth orange curves fit a generalized Poisson distribution to the count frequency data through the method of Maximum Likelihood Estimation (MLE). While variance-corrected Poisson sampling admits measures of center and spread that agree with the genuine data, an oscillatory artifact can be observed as a consequence of rounding. This oscillation is not present in the genuine data, making variance-corrected Poisson sampling a covertly problematic subsampling routine. Binomial sampling without replacement, on the other hand, produces child spectra with count frequencies that closely match the genuine data.
Figure 7.
Frequency distributions of counts in the sparse spectral region for . Error bars represent the expected variance of a Poisson variate, and dashed curves show the generalized-Poisson maximum-likelihood fit for the genuine replicates. On the (left) is shown the distribution from variance-corrected Poisson sampling, which exhibits unphysical oscillatory artifacts due to rounding even though the global distribution remains accurate. On the (right) is shown the distribution from binomial sampling without replacement, which matches the desired distribution in all respects.
3.3. Losslessness
Losslessness will be evaluated using two distinct methods: first, by comparing raw count totals, and second, by calculating the Wasserstein distance between the genuine and synthetic data. The first test will assess the preservation of total counts across the data while the second assesses the extent of distributional disagreement. For the first test, we compute the total counts in each child spectrum by summing across channels to get C, which can be expressed mathematically as
The normalized count total, , is then computed by dividing C by the total counts in the parent spectrum. Values of less than unity indicate a loss of information due to sampling, while values greater than unity indicate that information has been added. In either case, any deviation of from unity indicates that the method is not lossless. Let be the percent of information preserved by the method. Figure 8 compares by method across progeny sizes. Binomial sampling without replacement is perfectly lossless, and the inverse transform methods are both nearly lossless. Both Poisson sampling methods, however, seem to produce more counts than is appropriate for and , and fewer than desired for .
Figure 8.
Percentage of total counts recovered () by each method across progeny sizes as computed for the full spectrum. All methods are reasonably lossless, but only binomial sampling without replacement is perfectly lossless.
As previously discussed, we are particularly interested in preserving sparse spectral regions. By considering only the portion of the spectrum between channels 13,800 and 14,500, we can test losslessness in one such region. In Figure 9 we visualize for the aforementioned subset. As expected, binomial sampling without replacement continues to be perfectly lossless. Variance-corrected Poisson sampling, on the other hand, produces a synthetic generation lacking nearly of the original sparse region counts. This dramatic decrease in performance likely results from rounding required to sample whole-number counts for each bin.
Figure 9.
Percentage of counts recovered in the sparse spectral region (channels 13,800–14,500). Binomial sampling without replacement is again the only method that is perfectly lossless, and all other methods show defects. Variance-corrected Poisson is especially problematic.
The second test of losslessness involves comparing the parent spectrum to the sum of synthetic children for each method to evaluate the fidelity with which each method reconstructs the original data. Wasserstein distance will be used for this comparison. Wasserstein distance is a metric that operates on two probability distributions and can be thought of as the minimum energy cost to transform one distribution into the other. Formally, the p-Wasserstein distance is defined between probability measures P and Q with samples and , respectively, and is expressed as
where the infimum is computed over all permutations of n samples realized from Q [22].
Figure 10 presents a barplot of Wasserstein distance across methods for all progeny sizes while Figure 11 presents a barplot of Wasserstein distance across methods for the sparse region. Binomial sampling without replacement perfectly reconstructs the original data. inverse transform sampling with partial replacement consistently has the second-smallest Wasserstein distance (less than 0.05 for all progeny sizes) and is barely visible on the plot. The remaining three methods were not significantly different from one another. We also compared Wasserstein distance across sampling scenarios for the subset region and obtained similar results. Consistent with the test of summarized in Figure 9, variance-corrected Poisson sampling performed much worse than the other methods when . As expected, binomial sampling without replacement consistently outperforms all other methods on the Wasserstein metric, with inverse transform sampling with partial replacement a close second.
Figure 10.
Wasserstein distance between the parent spectrum and the summed synthetic child spectra for each subsampling method across progeny sizes as computed for the full spectrum. Binomial sampling without replacement achieves a distance of zero, indicating perfect reconstruction. Inverse-transform sampling with partial replacement is the next closest method, with differences too small to be visible at this scale.
Figure 11.
Wasserstein distance between genuine and reconstructed spectra in the sparse region (channels 13,800–14,500). Binomial sampling without replacement again achieves perfect reconstruction, and inverse-transform sampling with partial replacement is again the next closest method.
3.4. Summary of Evaluation
In this section, we compared the run-to-run variance of synthetic child spectra to that of genuine replicates, and found three methods that adequately capture this important source of information: naïve Poisson sampling, binomial sampling without replacement, and inverse transform sampling with replacement. A visual inspection of the relationship between progeny size and difference in average normalized variance exposed the inadequacy of the variance-corrected Poisson method and the inverse transform with partial replacement for small progeny sizes. Channel-to-channel variance was then measured and significance assessed using the bootstrap. Binomial sampling without replacement was the only sampling technique that successfully preserved channel-to-channel variance for sparse regions. Though variance-corrected Poisson sampling appeared promising at first, a plot of channel count frequency elucidated major distributional disagreement between the synthetic and genuine replicates. Lastly, we tested losslessness to determine how well each method preserved the total information present in the parent spectrum. Due largely to sampling without replacement, binomial sampling without replacement and inverse transform sampling with partial replacement outperformed the others both in terms of total counts, and summed-spectrum Wasserstein distance.
4. Conclusions
This study evaluated five spectral subsampling methods using three criteria: run-to-run variance, channel-to-channel variance, and losslessness. Our tests show that binomial random sampling without replacement is uniquely capable of preserving key statistical properties of genuine replicate spectra. It avoids artifacts introduced by rounding, maintains proper dispersion even in sparse spectral regions, and perfectly preserves total counts across child spectra. While other methods—such as inverse transform sampling or variance-corrected Poisson sampling—show partial strengths, they fail under certain conditions, particularly at low progeny sizes or in sparse data regions. Rounding, in particular, introduces bias that may not be readily apparent, but which significantly affects spectrum fidelity.
It is worth noting that the selected method, binomial sampling without replacement, does not produce completely independent children. By design, the allocation of a count to one child requires that count to be absent from all other children, inducing a negative correlation across children for a particular channel, given the parent. Thus, child spectra derived from a single parent are dependent on that parent, whereas genuine replicates would be completely independent. For the purposes of this study, the induced dependence is not critical since our central goal is to determine how well different subsampling algorithms reproduce the marginal behavior of individual child spectra. As shown in Appendix A.2, under the binomial sampling regime the marginal distribution of each child matches an independent Poisson measurement with the correct rate parameter, and this result is corroborated by our analysis. Finally, dependence on the parent should be expected from any partitioning algorithm and it does not alter the conclusions we draw.
Numerous research domains stand to benefit from the binomial sampling approach, including our primary applications of RIID algorithm verification, machine learning for gamma-ray spectroscopy, and limit-of-detection (LOD) studies aimed at strengthening nuclear triage capabilities. Beyond our immediate focus, this method may also offer value in other scientific fields where spectral data are available only in histogram form and list-mode event data are absent. In such contexts, our technique provides a statistically principled means of approximating the behavior of list-mode data without requiring timestamps.
We recommend the adoption of binomial sampling without replacement as a default method for generating synthetic child spectra when only a single parent histogram is available. The approach is mathematically defensible, straightforward to implement, and robust across various sampling scenarios. Software developers relying on alternative methods may benefit from revisiting their implementations in light of these findings.
Future work may examine the robustness of this recommendation for spectra affected by temporal drift or more complex detector behaviors. Nonetheless, this analysis provides a strong foundation for improving the fidelity of synthetic gamma-ray spectra in nuclear safeguards, emergency response, and machine learning applications.
Author Contributions
H.J.H.: methodology; software; writing—original draft; writing—review and editing. T.L.B.: conceptualization; methodology; writing—review and editing. S.C.: conceptualization; writing—review and editing. J.K.: conceptualization. D.J.M.: conceptualization; data curation; methodology; software; writing—original draft; writing—review and editing. A.A.S.: software. T.J.S.III: software; writing—review and editing. E.N.S.: software; writing—review and editing. All authors have read and agreed to the published version of the manuscript.
Funding
This work was funded by the National Nuclear Security Administration (NNSA), Office of Nuclear Incident Response (NA-84).
Data Availability Statement
The original 156 child spectra will be made available to the public.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| LOD | Limit of Detection |
| RIID | Radiation Isotope Identification Devices |
| BAT | Burst Alert Telescope |
| DPH | Detector Plane Histogram |
| XRF | X-ray Fluorescence |
| AIP | Algorithm Improvement Project |
| SNM | Special Nuclear Material |
| GADRAS | Gamma Detector Response and Analysis Software |
| MLP | Multi-Layer Perceptron |
| CNN | Convolutional Neural Network |
| LSTM | Long Short-Term Memory |
| RASE | Replicative Assessment of Spectroscopic Equipment |
| FRAM | Fixed-Energy Response-function Analysis with Multiple efficiencies |
| CDF | Cumulative Distribution Function |
| PMF | Probability Mass Function |
| MLE | Maximum Likelihood Estimation |
Appendix A
Appendix A.1. Variance Derivation for Method 1B
Choose
where
We seek an expression for in terms of and (total live time and total subset time, respectively) that forces Var , the expected variance for a properly dispersed spectrum. Since is, by construction, the sum of two correlated random variables, we can compose the variance as
where Equation (A5) uses the law of total variance to propagate the variance of . After setting (A7) equal to and solving for ,
While is guaranteed to be an integer value, is not. Since we require whole-number counts at each channel (we do not detect some fraction of a photon—it is either detected or it is not), will be rounded to the nearest integer.
Appendix A.2. Marginal Distribution Under Binomial Subsampling
Choose
To show this technique is mathematically sound, we use the law of total probability to derive the marginal distribution of . We know that , the counts in channel i of the parent spectrum, is a Poisson random variable, so let Poisson . Then Binomial . Hence,
which is the probability mass function (PMF) of a Poisson random variable with rate parameter , where and . Thus, chosen in this manner should have the correct channel-to-channel variance, and display no bias due to rounding. The expected variance Var conditional on for child spectra chosen through this method is .
References
- NASA/GSFC HEASARC. Batsurvey—Perform BAT Survey Imaging Analysis. HEASoft. 2008. Available online: https://heasarc.gsfc.nasa.gov/docs/software/lheasoft/help/batsurvey.html (accessed on 21 December 2025).
- XRD & XRF Raw Data. Available online: https://doi.org/10.17632/nkpmdtdkfw.1 (accessed on 21 December 2025).
- Figueroa-Rosales, E.X.; Martínez-Juárez, J.; García-Díaz, E.; Hernández-Cruz, D.; Sabinas-Hernández, S.A.; Robles-Águila, M.J. Photoluminescent properties of hydroxyapatite and hydroxyapatite/multi-walled carbon nanotube composites. Crystals 2021, 11, 832. [Google Scholar] [CrossRef]
- Enghauser, M. Algorithm Improvement Program Nuclide Identification Algorithm Scoring Criteria and Scoring Application; Technical Report; Sandia National Laboratories (SNL-NM): Albuquerque, NM, USA, 2016. [Google Scholar] [CrossRef]
- Mitchell, D.J.; Harding, L.; Thoreson, G.G.; Horne, S.M. GADRAS Detector Response Function; Technical Report; Sandia National Laboratories (SNL-NM): Albuquerque, NM, USA, 2014. [Google Scholar] [CrossRef]
- Fournier, S.D.; Enghauser, M.; Leonard, E.J.; Thoreson, G.G. GADRAS Batch Inject Tool User Guide; Technical Report; Sandia National Laboratories (SNL-NM): Albuquerque, NM, USA, 2020. [Google Scholar] [CrossRef]
- Lalor, P.; Adams, H.; Hagen, A. Sim-to-real supervised domain adaptation for radioisotope identification. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2026, 1083, 171159. [Google Scholar] [CrossRef]
- PyRIID v.2.0.0. Available online: https://doi.org/10.11578/dc.20221017.2 (accessed on 21 December 2025).
- Kwon, J.; Kim, J.; Kim, H.; Kim, S.; Jang, S.; Lee, J.; Kim, Y.s. Development of gamma-spectrum data generation method by Monte Carlo simulation. J. Korean Phys. Soc. 2023, 82, 658–670. [Google Scholar] [CrossRef]
- Agostinelli, S.; Allison, J.; Amako, K.; Apostolakis, J.; Araujo, H.; Arce, P.; Asai, M.; Axen, D.; Banerjee, S.; Barrand, G.; et al. Geant4—A simulation toolkit. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2003, 506, 250–303. [Google Scholar] [CrossRef]
- Chavez, J.R.; Czyz, S.A.; Sangiorgio, S.; Brodsky, J.P.; Kosinovsky, G.A. Replicative Assessment of Spectroscopic Equipment; Lawrence Livermore National Laboratory (LLNL): Livermore, CA, USA, 2020. [Google Scholar] [CrossRef]
- Arlt, R.; Baird, K.; Blackadar, J.; Blessenger, C.; Blumenthal, D.; Chiaro, P.; Frame, K.; Mark, E.; Mayorov, M.; Milovidov, M.; et al. Semi-empirical approach for performance evaluation of radionuclide identifiers. In Proceedings of the 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC), Orlando, FL, USA, 24 October–1 November 2009; pp. 990–994. [Google Scholar] [CrossRef]
- Flynn, A.; Boardman, D.; Reinhard, M.I. The validation of synthetic spectra used in the performance evaluation of radionuclide identifiers. Appl. Radiat. Isot. 2013, 77, 145–152. [Google Scholar] [CrossRef] [PubMed]
- Vo, D.T.; Sampson, T.E. FRAM, Version 6.1 User Manual; Technical Report; Los Alamos National Laboratory (LANL): Los Alamos, NM, USA, 2020. [Google Scholar] [CrossRef]
- Burr, T.; Hamada, M.S.; Graves, T.L.; Myers, S. Augmenting real data with synthetic data: An application in assessing radio-isotope identification algorithms. Qual. Reliab. Eng. Int. 2009, 25, 899–911. [Google Scholar] [CrossRef]
- Burr, T.; Hamada, M. Radio-isotope identification algorithms for NaI γ spectra. Algorithms 2009, 2, 339–360. [Google Scholar] [CrossRef]
- Raikov, D. On the decomposition of Gauss and Poisson laws. Izv. Math. 1938, 2, 91–124. [Google Scholar]
- Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab. 1979, 7, 893–899. [Google Scholar] [CrossRef]
- Weiss, C.H. Thinning operations for modeling time series of counts—A survey. Adv. Stat. Anal. 2008, 92, 319–341. [Google Scholar] [CrossRef]
- Bossew, P. A very long-term HPGe-background gamma spectrum. Appl. Radiat. Isot. 2005, 62, 635–644. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Liu, Y.; Wu, B.; Meng, X.; Wang, J.; Cheng, J. Reconstruction of indoor gamma-ray background spectrum for HPGe detectors. Radiat. Meas. 2024, 174, 107139. [Google Scholar] [CrossRef]
- Panaretos, V.M.; Zemel, Y. Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Its Appl. 2019, 6, 405–431. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.










