Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data

Golovko, Victor V.

doi:10.3390/biom15050704

Open AccessArticle

Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data

by

Victor V. Golovko

Canadian Nuclear Laboratories, 286 Plant Road, Chalk River, ON K0J 1J0, Canada

Biomolecules 2025, 15(5), 704; https://doi.org/10.3390/biom15050704

Submission received: 27 April 2025 / Revised: 8 May 2025 / Accepted: 9 May 2025 / Published: 12 May 2025

(This article belongs to the Topic Bioinformatics in Drug Design and Discovery—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Estimating confidence intervals in small or noisy datasets is a recurring challenge in biomolecular research, particularly when data contain outliers or exhibit high variability. This study introduces a robust statistical method that combines a hybrid bootstrap procedure with Steiner’s most frequent value (MFV) approach to estimate confidence intervals without removing outliers or altering the original dataset. The MFV technique identifies the most representative value while minimizing information loss, making it well suited for datasets with limited sample sizes or non-Gaussian distributions. To demonstrate the method’s robustness, we intentionally selected a dataset from outside the biomolecular domain: a fast-neutron activation cross-section of the ¹⁰⁹Ag(n, 2n)^108mAg reaction from nuclear physics. This dataset presents large uncertainties, inconsistencies, and known evaluation difficulties. Confidence intervals for the cross-section were determined using a method called the MFV–hybrid parametric bootstrapping (MFV-HPB) framework. In this approach, the original data points were repeatedly resampled, and new values were simulated based on their uncertainties before the MFV was calculated. Despite the dataset’s complexity, the method yielded a stable MFV estimate of 709 mb with a 68.27% confidence interval of [691, 744] mb, illustrating the method’s ability to provide interpretable results in challenging scenarios. Although the example is from nuclear science, the same statistical issues commonly arise in biomolecular fields, such as enzymatic kinetics, molecular assays, and diagnostic biomarker studies. The MFV-HPB framework provides a reliable and generalizable approach for extracting central estimates and confidence intervals in situations where data are difficult to collect, replicate, or interpret. Its resilience to outliers, independence from distributional assumptions, and compatibility with small-sample scenarios make it particularly valuable in molecular medicine, bioengineering, and biophysics.

Keywords:

most frequent value; hybrid parametric bootstrapping; robust statistical method; fast-neutron activation cross-section of the ¹⁰⁹Ag(n, 2n)^108mAg reaction; half-life of the ^108mAg

1. Introduction

Reliable statistical analysis is essential in many areas of biomolecular sciences, including bioinformatics [1], biomedical diagnostics [2], and bioprocess engineering [3]. Researchers in these fields often work with complex datasets that are small in size, contain outliers, or deviate from normal (Gaussian) distributions [4]. These characteristics can make conventional statistical approaches, such as the arithmetic mean or least squares fitting, less reliable or misleading.

The most frequent value (MFV) method is a robust alternative that estimates the central value of a dataset based on its densest region. This makes the proposed method highly resistant to outliers and better suited for datasets with irregular structures or measurement noise. Because it preserves more of the original information in the data, the MFV is particularly valuable in biomolecular contexts, where measurements are often costly or variable, and data points cannot be easily discarded.

In bioinformatics, MFVs can improve the analysis of high-dimensional datasets, such as gene expression profiles, protein folding simulations, and pathway activity models [5]. These analyses frequently involve biological replicates with different levels of noise, and the MFV is a stable way to determine representative values that are not overly influenced by outliers. For instance, RNA-seq or proteomics data often contain both biological and technical variability [6], and the MFV can help extract meaningful trends under such conditions.

In biomedical diagnostics, sensor readings, imaging data, and molecular assays can be affected by noise, sampling inconsistencies, or measurement anomalies. The MFV allows these values to be processed more reliably. When combined with bootstrapping—a method that estimates statistical confidence intervals by repeatedly resampling the dataset—the MFV approach provides robust estimates of variability without assuming a specific data distribution [7,8]. This combination is particularly useful when working with small or irregularly sampled datasets, which often occur in clinical research.

In bioprocess engineering, where biological production systems are optimized and monitored, variability in feedstock, growth rates, and process parameters can complicate the analysis of production yields or reaction efficiencies [9,10]. The MFV method can help identify representative performance metrics in the presence of such variations, whereas bootstrapping provides reliable uncertainty estimates to support process decisions and quality control.

Bootstrapping is especially useful when the data distribution is unknown or non-Gaussian. Traditional confidence interval (CI) estimation methods [11] may not perform well under these conditions. Bootstrapping addresses this by generating distributions of the statistic of interest from the resampled datasets, thereby making it highly flexible. When used with the MFV, this technique supports statistically sound conclusions even in the presence of extreme values or limited sample sizes.

This study presents a practical approach for estimating confidence intervals by integrating the MFV with both traditional and hybrid parametric bootstrapping. The proposed method is designed to be broadly applicable, particularly in cases where datasets are small, variable, or contain outliers. Although the method is demonstrated using a dataset from nuclear physics—specifically, activation cross-section measurements—the statistical issues addressed here are directly relevant to biomolecular sciences. These include challenges encountered in protein–ligand binding assays, enzymatic rate measurements, and molecular biomarker evaluations, such as high-sensitivity C-reactive protein measurements in liver disease studies [12], where population variability, sex-based differences, and limited measurement precision can introduce noise and uncertainty into the analysis.

Historically, before the 1960s, the concept of the most frequent value was known but was rarely used because of computational constraints. The arithmetic mean and Gaussian-based least squares methods were more common even though many real-world datasets did not meet the assumptions required by these methods.

A shift began in the 1970s and 1980s, when researchers such as Steiner, Csernyák, and Hajagos formalized the MFV method and demonstrated its practical advantages. The key contributions from this period include the following:

Csernyák and Steiner (1980) [13] introduced a practical way to compute the MFV’s scale parameter, known as dihesion.
Steiner (1980) [14] demonstrated that the MFV offers greater resistance to outliers than traditional least squares fitting.
Csernyák, Hajagos, and Steiner (1981) [15] established mathematical foundations for the convergence of MFV estimates.
Hajagos (1982) [16] demonstrated that the proposed method minimizes information loss when estimating central values.

These developments were compiled in Steiner’s 1988 monograph, Most Frequent Value Procedures [17], which helped expand the use of the MFV beyond geophysics and into other scientific domains.

This study builds on that foundation by showing how the MFV and bootstrapping can be combined to improve data analysis in biomolecular research. Together, they provide a statistically robust framework for analyzing datasets that are small, noisy, or include extreme values—features commonly found in molecular biology, bioassays, and clinical studies. The proposed method enhances confidence in statistical inferences without compromising the integrity of the original data.

Artificial datasets have been used in past studies to demonstrate the performance of the MFV algorithm. For instance, Golovko et al. (2023) [18] compared the mode statistic with the MFV statistic using an artificial dataset, and Golovko (2025) [19] introduced the MFV–hybrid parametric bootstrapping framework with a small four-element dataset that had varied uncertainties. However, in our study, we chose real-world nuclear physics data. We selected this dataset due to its natural variability and tendency to contain outliers, which make it an excellent choice for testing the robustness and applicability of the proposed framework. Using real data not only proves the method’s effectiveness but also showcases its potential usefulness across different scientific fields.

2. Methodology

This study emphasizes the application of robust statistical methods, particularly the most frequent value approach [20], as an alternative to traditional averaging techniques based on the least squares principle [21]. The MFV method addresses the key limitations inherent in standard methods such as the arithmetic mean, which assumes that the underlying data follow a Gaussian (normal) distribution. In many real-world scenarios, error distributions significantly deviate from Gaussian assumptions, rendering traditional methods inefficient and prone to inaccuracies.

Traditional methods, such as the arithmetic mean, are optimal only when the data follow a Gaussian distribution. However, non-Gaussian distributions require substantially more data to achieve a similar level of accuracy compared to robust alternatives [17]. In addition, least squares techniques are highly sensitive to outliers—data points that deviate significantly from the main data cluster. These outliers can disproportionately influence the resulting estimates, leading to skewed or misleading results.

The MFV approach overcomes these limitations by identifying the densest cluster of data points [22], effectively reducing the influence of outliers. Unlike traditional averaging methods, the MFV is designed to handle non-Gaussian error distributions [23] and provides more reliable central tendency estimates for datasets with irregular or skewed distributions.

The MFV method offers several notable advantages over traditional averaging techniques such as the arithmetic mean. A key benefit of an MFV is its robustness to outliers [24]. Unlike traditional methods, which can be heavily influenced by extreme data points, the MFV is based on the concept of minimizing information loss [25], ensuring that the central estimate accurately reflects the majority of the data. This makes it particularly effective in scenarios where datasets contain irregular or extreme values [26]. In addition, the MFV applies to non-Gaussian distributions, which are frequently encountered in many practical fields. Traditional methods often assume Gaussian error distributions, limiting their effectiveness when this assumption is not met. The MFV, on the other hand, is well suited for datasets with non-standard distributions [27].

Another significant advantage of the MFV is its efficiency. By concentrating on the densest cluster of data [28], the MFV provides accurate estimates with fewer data points, thereby reducing the need for extensive sampling. Furthermore, the proposed method improves accuracy by avoiding biases introduced by extreme values, thereby producing results that are closer to the true characteristics of the data. These features make the proposed MFV a significant improvement over traditional least squares techniques, particularly in applications where data variability and non-standard distributions are common [29]. The MFV is a robust, efficient, and reliable alternative for statistical analysis, making it an essential tool for modern data analysis.

Although the MFV method and its scale parameter (also referred to as “dihesion”) were originally developed by Steiner and have been discussed in various papers [30] and books [31,32], many of these resources are written for readers already familiar with the approach. As a result, they often skip the step-by-step derivation of the key equations used to calculate the central value and spread. This can make it difficult for researchers new to the MFV method to fully understand how it works or how to apply it effectively.

To address this gap, we have included a detailed explanation of how these equations are derived. The following section describes the process based on the principle of minimizing information loss between the observed dataset and a substituting analytical distribution. Our goal is to make the MFV method more accessible and transparent and easier to apply in modern statistical analysis, especially when working with small datasets or those containing outliers.

The MFV method, as a robust statistical estimate [18], is particularly powerful when paired with bootstrapping techniques to estimate confidence intervals (CIs) with high reliability. This combination ensures that the CI is unaffected by the presence of outliers in the dataset. Unlike the mean statistic, which is not robust and can be significantly distorted by extreme values, the MFV minimizes the influence of outliers by focusing on the densest cluster of data. This makes bootstrapping the MFV-based CI a useful approach for datasets with irregular distributions or outliers. By leveraging this robustness, the MFV method provides more reliable and accurate statistical inferences, making it an effective alternative to traditional mean-based methods for estimating CIs.

2.1. The Most Frequent Value

The Cauchy distribution [33] used in this section is referred to by various names, especially in physics, such as the Lorentz distribution [34], the Cauchy–Lorentz distribution, the Lorentzian function, or the Breit–Wigner distribution [35]. The importance of the Cauchy function lies in its role in most frequent value calculations. It allows for the preservation of information accuracy by substituting an unknown probability distribution with a known one. This practice is based on the principles of information theory and is specifically centered around the idea of Kullback–Leibler divergence [36]. This divergence measures information loss when a theoretical or analytical distribution approximates the true (but unknown) distribution.

The Cauchy distribution provides several advantages as the substituting distribution in this framework. One significant advantage of the proposed method is its ability to be applied to a broad spectrum of possible true distributions, encompassing those with more pronounced tails than the Gaussian distribution. Consider the scenario where we have two statistically independent random variables, A and B, with standard Gaussian distributions. When we calculate the ratio of A to B, the resulting distribution of this ratio is actually in the Cauchy form, which is a somewhat unexpected outcome [37]. Unlike the Gaussian distribution, which may lead to infinite divergence for some datasets, the Cauchy distribution ensures finite Kullback–Leibler divergence. This makes it particularly effective for handling non-standard distributions, which are commonly encountered in real-world data.

Minimizing the Kullback–Leibler divergence with the Cauchy distribution results in defining equations for the MFV (as the location parameter) and dihesion (as the scale parameter). The Cauchy distribution also acts as an ideal weight function in these calculations. It assigns higher weights to data points near the central cluster and progressively downweights the influence of outliers, which enhances the robustness of the MFV estimation. This property ensures that the MFV captures the true central tendency of the data without distortion by extreme values.

Although alternative approaches such as the maximum likelihood principle can also be used to derive MFV and dihesion formulas [17], they rely on the assumption that the true error distribution is known. The Kullback–Leibler divergence method acknowledges the uncertainty inherent in most practical scenarios and focuses on finding the best substituting distribution.

In addition, the Cauchy distribution offers another crucial advantage. Its robustness ensures a finite asymptotic variance for the MFV [16,38], even in datasets with significant outliers. This makes MFV calculations based on the Cauchy function a highly reliable and practical choice for estimating central tendencies in diverse datasets.

The concept of minimizing information loss addresses a fundamental challenge in data analysis: approximating an unknown true probability distribution with a known analytical distribution. This approach is especially important in fields of science where the true error distribution is often unknown and deviations from standard distributions such as the Gaussian distribution are common. By quantifying and minimizing information loss, scientists can make more informed decisions and extract reliable results from limited or uncertain datasets.

A key tool in this framework is the Kullback–Leibler (KL) divergence (also called relative information entropy [39,40] and I-divergence [17,41]), a measure of the difference between the actual distribution

f (x)

and a substituting distribution

g (x)

. Mathematically, Kullback–Leibler divergence is expressed as

D_{KL} (f (x) ∥ g (x)) = \int_{- \infty}^{\infty} f (x) log (\frac{f (x)}{g (x)}) d x,

(1)

where

f (x)

is the true distribution (unknown), and

g (x)

is the substituting distribution, which depends on parameter

x_{0}

, the location parameter. The goal is to minimize

D_{KL} (f (x) ∥ g (x))

with respect to

x_{0}

. The Kullback–Leibler divergence shows how different the two functions

g (x)

and

f (x)

are by measuring the amount of information lost when

g (x)

is used to approximate

f (x)

. While often called statistical “distance”, it is not a true distance because it is not symmetric and does not satisfy triangle inequality.

Let

g (x; x_{0}, γ)

be the Cauchy distribution [37,40]

g (x; x_{0}, γ) = \frac{1}{π} \cdot \frac{γ}{γ^{2} + {(x - x_{0})}^{2}},

(2)

where

γ

is the scale parameter (also called the half-width at half-maximum), and

x_{0}

represents the location parameter. The conditions for minimizing the quasi-distance between

g (x)

and

f (x)

, as measured by the Kullback–Leibler divergence, are fulfilled if the following equations hold:

\frac{d D_{KL} (f (x) ∥ g (x))}{d x_{0}} = 0,

(3)

and

\frac{d^{2} D_{KL} (f (x) ∥ g (x))}{d x_{0}^{2}} > 0 .

(4)

Using the expression for

D_{KL} (f (x) ∥ g (x))

from Equation (1), the condition in Equation (3) becomes

\int_{- \infty}^{\infty} \frac{\partial g (x; x_{0}, γ)}{\partial x_{0}} \cdot \frac{f (x)}{g (x; x_{0}, γ)} d x = 0,

(5)

and the second-derivative condition in Equation (4) requires

\int_{- \infty}^{\infty} {[\frac{\partial g (x; x_{0}, γ)}{\partial x_{0}} \cdot \frac{1}{g (x; x_{0}, γ)}]}^{2} f (x) d x - \int_{- \infty}^{\infty} \frac{\partial^{2} g (x; x_{0}, γ)}{\partial x_{0}^{2}} \cdot \frac{f (x)}{g (x; x_{0}, γ)} d x > 0 .

(6)

Equation (6) is automatically fulfilled if the second term vanishes. In other words,

\int_{- \infty}^{\infty} \frac{\partial^{2} g (x; x_{0}, γ)}{\partial x_{0}^{2}} \cdot \frac{f (x)}{g (x; x_{0}, γ)} d x = 0,

(7)

and the simultaneous satisfaction of Equations (5) and (7) results in values of

x_{0}

that guarantee the minimum Kullback–Leibler divergence.

For the Cauchy distribution, the partial derivative of

g (x; x_{0}, γ)

with respect to

x_{0}

is given by

\frac{\partial g (x; x_{0}, γ)}{\partial x_{0}} = - \frac{2 (x - x_{0})}{π} \cdot \frac{γ}{{[γ^{2} + {(x - x_{0})}^{2}]}^{2}} .

(8)

Substituting this expression into the integral condition (5) yields the following:

\int_{- \infty}^{\infty} \frac{x - x_{0}}{γ^{2} + {(x - x_{0})}^{2}} f (x) d x = 0 .

(9)

This integral determines the optimal value of

x_{0}

, ensuring that the substituting distribution

g (x; x_{0}, γ)

minimizes the Kullback–Leibler divergence relative to the unknown true distribution

f (x)

. To simplify it, we expand

\frac{x - x_{0}}{γ^{2} + {(x - x_{0})}^{2}} = \frac{x}{γ^{2} + {(x - x_{0})}^{2}} - \frac{x_{0}}{γ^{2} + {(x - x_{0})}^{2}},

(10)

which leads to

\int_{- \infty}^{\infty} [\frac{x}{γ^{2} + {(x - x_{0})}^{2}} - \frac{x_{0}}{γ^{2} + {(x - x_{0})}^{2}}] f (x) d x = 0 .

(11)

After separating the terms and rearranging them, we isolate

x_{0}

as

x_{0} = \frac{\int_{- \infty}^{\infty} \frac{x}{γ^{2} + {(x - x_{0})}^{2}} f (x) d x}{\int_{- \infty}^{\infty} \frac{1}{γ^{2} + {(x - x_{0})}^{2}} f (x) d x} .

(12)

This formula calculates a weighted average of the data, where points close to

x_{0}

receive larger weights than those farther away. This location parameter,

x_{0}

, is commonly referred to as M in the MFV methodology.

The next step involves isolating the scale parameter through minimization of the Kullback–Leibler divergence. We use Equation (7) for this purpose. Beginning with the first partial derivative given in Equation (8), the second derivative of

g (x; x_{0}, γ)

with respect to

x_{0}

becomes

\frac{\partial^{2} g (x; x_{0}, γ)}{\partial x_{0}^{2}} = \frac{2 γ}{π} \cdot \frac{3 {(x - x_{0})}^{2} - γ^{2}}{{[γ^{2} + {(x - x_{0})}^{2}]}^{3}} .

(13)

Substituting Equation (13) into Equation (7) leads to

\int_{- \infty}^{\infty} \frac{3 {(x - x_{0})}^{2} - γ^{2}}{{[γ^{2} + {(x - x_{0})}^{2}]}^{2}} f (x) d x = 0,

(14)

which can be rearranged to isolate

γ^{2}

as follows:

γ^{2} = 3 \cdot \frac{\int_{- \infty}^{\infty} \frac{{(x - x_{0})}^{2}}{{[γ^{2} + {(x - x_{0})}^{2}]}^{2}} f (x) d x}{\int_{- \infty}^{\infty} \frac{1}{{[γ^{2} + {(x - x_{0})}^{2}]}^{2}} f (x) d x} .

(15)

In the MFV framework, this scale parameter is typically referred to as

ε

in place of

γ

.

For a finite-sample dataset

{x_{1}, x_{2}, \dots, x_{n}}

, the empirical distribution function

f (x) = \frac{1}{n} \sum_{i = 1}^{n} δ (x - x_{i})

(16)

is used. Substituting the empirical function into Equations (12) and (15) provides the sample-based MFV

M_{n}

and dihesion,

ε_{n}

through

M_{n} = \frac{\sum_{i = 1}^{n} \frac{x_{i}}{ε_{n}^{2} + {(x_{i} - M_{n})}^{2}}}{\sum_{i = 1}^{n} \frac{1}{ε_{n}^{2} + {(x_{i} - M_{n})}^{2}}},

(17)

and

ε_{n}^{2} = 3 \cdot \frac{\sum_{i = 1}^{n} \frac{{(x_{i} - M_{n})}^{2}}{{[ε_{n}^{2} + {(x_{i} - M_{n})}^{2}]}^{2}}}{\sum_{i = 1}^{n} \frac{1}{{[ε_{n}^{2} + {(x_{i} - M_{n})}^{2}]}^{2}}},

(18)

which define

M_{n}

and

ε_{n}

through weighted sums that strongly favor points close to

M_{n}

, effectively suppressing the influence of outliers.

Each equation for

M_{n}

and

ε_{n}

depends on the other’s current estimates, so an iterative procedure is required to solve them. Commonly, one starts with

\begin{matrix} M_{n}^{(0)} & = \frac{1}{n} \sum_{i = 1}^{n} x_{i}, \end{matrix}

(19)

\begin{matrix} ε_{n}^{(0)} & = \frac{\sqrt{3}}{2} \cdot (x_{\max} - x_{\min}), \end{matrix}

(20)

or one can choose the median of the dataset for

M_{n}^{(0)}

if the data contain extreme outliers [42]. The iterative updates are

\begin{matrix} M_{n}^{(k + 1)} & = \frac{\sum_{i = 1}^{n} \frac{x_{i}}{{(ε_{n}^{(k)})}^{2} + {(x_{i} - M_{n}^{(k)})}^{2}}}{\sum_{i = 1}^{n} \frac{1}{{(ε_{n}^{(k)})}^{2} + {(x_{i} - M_{n}^{(k)})}^{2}}}, \end{matrix}

(21)

\begin{matrix} {(ε_{n}^{(k + 1)})}^{2} & = 3 \cdot \frac{\sum_{i = 1}^{n} \frac{{(x_{i} - M_{n}^{(k)})}^{2}}{{[{(ε_{n}^{(k)})}^{2} + {(x_{i} - M_{n}^{(k)})}^{2}]}^{2}}}{\sum_{i = 1}^{n} \frac{1}{{[{(ε_{n}^{(k)})}^{2} + {(x_{i} - M_{n}^{(k)})}^{2}]}^{2}}}, \end{matrix}

(22)

until the changes in both

M_{n}

and

ε_{n}

become negligible.

The final values,

M_{n}

and

ε_{n}

, provide robust estimates of the central tendency (location) and spread (scale) of the dataset. Unlike traditional means and standard deviations, these quantities are far less affected by outliers or heavy-tailed distributions.

It can be helpful to see why Equations (17) and (18) look like weighted averages and how they confer robustness. Minimizing Kullback–Leibler divergence with a Cauchy form imposes a higher weight on data points that lie close to the current location estimate

M_{n}

. The term

ε_{n}^{2} + {(x_{i} - M_{n})}^{2}

in the denominator becomes large for points lying far from

M_{n}

; thus, distant points carry less influence on the estimates. Consequently, outliers do not inflate the location or spread values as strongly as they would in the case of standard arithmetic means or variances.

Intuitively, the most frequent value

M_{n}

emerges as the “peak” of a Cauchy curve positioned to match the core mass of the data, with minimal impact from points on the periphery. Similarly, the dihesion

ε_{n}

quantifies how widely the main cluster of data is spread around

M_{n}

, again downlighting extreme values that might otherwise dominate a conventional variance calculation. As a result, these MFV estimates remain stable and representative even when the dataset exhibits strong deviations from Gaussian assumptions or contains significant outliers.

By iteratively updating

M_{n}

and

ε_{n}

until convergence, we obtain final values that minimize information loss under Cauchy substitution. This procedure directly addresses the requirement to handle non-Gaussian errors and ensures that the substitution remains valid for a broad range of real-world data distributions. This robustness makes the MFV a valuable alternative to traditional least squares or mean-based methods, offering more reliable central estimates and spread measurements in diverse applications.

It is important to emphasize that the function

f (x)

represents the unknown “true” distribution of the real-world data. In practice, this distribution can be arbitrary and does not need to follow any standard or symmetric shapes, such as Gaussian shapes. The MFV approach addresses this general case by substituting an unknown distribution with a Cauchy distribution in a way that minimizes Kullback–Leibler divergence. This ensures that the estimated location (and scale) parameters remain robust even when the data are drawn from a heavy-tailed or skewed distribution.

2.2. Bootstrapping for Robust Confidence Interval Estimation

Bootstrapping is a non-parametric resampling technique used to estimate the variability and confidence intervals of a statistical measure without making strong assumptions about the underlying data distribution [43,44]. This approach is particularly useful when working with the MFV because it allows for robust confidence interval estimation while minimizing sensitivity to outliers and small sample sizes.

The procedure generates multiple resampled datasets (termed “bootstrap samples”) by randomly sampling the original dataset with replacement. For each bootstrap sample, the statistic of interest, such as the MFV, is recalculated. This results in an empirical distribution of the statistic from which confidence intervals can be derived using the percentile method [45,46]. The confidence interval (CI) is obtained by identifying the appropriate quantiles of the bootstrap distribution. For instance, a 95.45% CI is defined as follows:

{CI}_{95.45 %} = [Q_{0.02275}, Q_{0.97725}],

(23)

where

Q_{α}

represents the

α

-th quantile of the bootstrap distribution. Similarly, a 68.27% CI uses the 15.87th and 84.13th percentiles.

This approach offers several advantages. First, it is highly robust [47] against outliers, particularly when combined with a robust statistic such as the MFV. By incorporating resampling techniques, bootstrapping accounts for data variability while reducing the influence of extreme values on the confidence interval estimation. Second, bootstrapping is flexible and does not require assumptions about the underlying distribution of data. This makes it a valuable tool in real-world applications where the data structure is complex or unknown.

For small datasets that include measurement uncertainties, the hybrid parametric bootstrap (HPB) method offers a powerful way to estimate confidence intervals [48,49]. This approach combines two statistical techniques—non-parametric resampling and parametric simulation—to account for both the natural variability in the data and the uncertainties reported with each measurement.

The HPB process begins with a non-parametric bootstrap step. From the original dataset of N measurements, a new synthetic dataset is created by randomly sampling with replacement. This means that some measurements may appear more than once, while others may not be selected at all. This resampling mimics the uncertainty about which data points best represent the underlying distribution.

Next, each resampled data point is used to generate a simulated value through a parametric step. For each selected point, a new value is randomly drawn from a normal (Gaussian) distribution centered on the original measurement, with the corresponding reported uncertainty used as the standard deviation. If a simulated value is physically impossible—such as a negative cross-section or half-life—it is discarded and redrawn. This ensures all values remain physically meaningful.

After both steps are completed, the most frequent value is calculated for the simulated dataset. This entire two-step process is repeated many times, generating a distribution of MFV estimates. Confidence intervals, such as the 68.27% range (equivalent to 1-sigma in Gaussian statistics), are then calculated using the percentile method [45,46].

By combining realistic uncertainty modeling with repeated resampling, the HPB method provides a statistically robust and physically meaningful way to quantify uncertainty in central estimates, even when working with small or irregular datasets.

The integration of the MFV with bootstrapping provides a powerful framework for analyzing small datasets, which are costly to obtain or are prone to outliers. This is especially beneficial in fields such as environmental monitoring and radiation measurement, where robust statistical methods are essential for accurate analysis. By iteratively resampling and recalculating the MFV, the derived confidence intervals become more reliable and representative of the underlying data. This methodology ensures minimal information loss, making it a practical and effective approach for applications requiring high-precision statistical inference.

3. Description of the Methods and Results

Nuclear physics cross-sectional datasets provide compelling cases for applying robust statistical methods such as the MFV approach and bootstrapping. These datasets often contain inconsistencies, outliers, and non-Gaussian distributions stemming from experimental challenges, including fluctuations in neutron energy, difficulties with sample preparation, and uncertainties in detection methods. As a result, different laboratories may report varying cross-sectional values for the same nuclear reaction due to differences in equipment calibration, experimental procedures, and data processing methodologies. This variability exposes the limitations of conventional statistical tools, especially those that rely on the arithmetic mean, which can be significantly skewed by outliers.

The MFV method offers a more resilient measure of central tendency by identifying the most densely populated data region. When combined with bootstrapping to estimate confidence intervals, this approach yields robust and interpretable results, particularly for datasets with limited sample sizes. The use of advanced statistical techniques in nuclear physics improves the reliability of data analysis and enhances the precision of outcomes in critical applications, such as nuclear reactor design, isotope production, and radiation shielding.

As part of this work, a benchmark dataset based on neutron lifetime measurements [25] was used to further validate the MFV method. These measurements provided an independent check that the MFV approach yields consistent results. In fact, the same MFV value for neutron lifetime was obtained as reported in the original study [25], confirming the method’s accuracy and reproducibility.

The dataset of ¹⁰⁹Ag(n, 2n)^108mAg cross-sections in Table 1 served as a practical example for applying the MFV method and bootstrapping. The experimental setup introduced significant uncertainties and inconsistencies, reflecting its complexity. The MFV method mitigated the influence of outliers, and bootstrapping provided insight into the variability and confidence intervals of the data.

Measuring the ¹⁰⁹Ag(n, 2n)^108mAg cross-section remains challenging due to experimental difficulties, technical constraints, and limited resources. Generating and controlling fast neutrons is particularly problematic. Neutron sources, whether reactors or generators, can be difficult to monitor precisely in terms of flux and energy. Often, neutron energies are inferred from the cross-section ratios of other reactions, introducing additional uncertainty.

Because the cross-section is highly dependent on neutron energy, we restricted our analysis to measurements within the 14.7 ± 0.2 MeV range, specifically, between 14.5 and 14.9 MeV (see Table 1). This filtering ensured uniformity and reduced variability under different experimental energy conditions.

Sample preparation also affects the measurement quality. High-purity silver samples with well-defined geometry minimize bias [61]. Cadmium shielding is typically employed to suppress low-energy neutron capture on ¹⁰⁷Ag, although some residual contributions remain [61]. Gamma-ray spectrometry adds further uncertainty due to factors such as detector efficiency, counting statistics, and self-absorption corrections.

The half-life of ^108mAg has undergone significant revision over the past decades, which has also contributed to variations in the reported cross-section values for reactions producing this isomer. Early theoretical studies suggested a minimum half-life of 5 years [62], which was subsequently refined by a series of experimental investigations: from

127 \pm 7

years [63] to

418 \pm 15

years [64] to

437.7 \pm 7.7

years [65], and, most recently, to

448 \pm 27

years [66].

A landmark early experimental determination was reported in 1969 by Vonach et al., who estimated the half-life to be

310 \pm 132

years [50]. Their approach relied on evaluating the absolute activity of a sample containing a known quantity of ^108mAg, and estimating the half-life using the cross-section of the ¹⁰⁹Ag(n, 2n)^108mAg reaction, for which they adopted a value of

670 \pm 266 mb

. Although this measurement carried a large uncertainty, it was based on a methodology distinct from later decay-spectroscopy-based measurements. As such, it provided valuable complementary information rooted in different experimental systematics. This makes the 1969 result a critical reference point for validating updated half-life determinations through modern statistical reanalysis.

In 2024, Song et al. [61] re-evaluated several fast-neutron activation cross-sections using an updated half-life for ^108mAg of 438 years, as shown in Table 2. Their adjustments significantly shifted the data distribution toward higher values. The original low-value cluster between 200 and 300 mb was absent in the re-evaluated set; instead, it clustered tightly between 600 and 800 mb, as shown in Figure 1.

Figure 2, which presents the re-evaluated fast-neutron activation cross-section data for the ¹⁰⁹Ag(n, 2n)^108mAg reaction at

E_{n} = 14.7 \pm 0.2 MeV

, shows a marked improvement in data consistency compared to the original dataset displayed in Figure 1.

In Figure 1, the distribution of the cross-section values is notably irregular and skewed, with a broad spread of measurements ranging from low to high values. This dataset includes several low-valued outliers that significantly influence the statistical estimates. As a result, the weighted average is much lower (

399 mb

) than both the arithmetic mean (

638 mb

) and the MFV (

685 mb

). The mismatch between these indicators of central tendency highlights the presence of inconsistencies and the potential impact of outliers on the overall estimation.

In contrast, Figure 2 shows a more symmetrical and concentrated distribution, with most cross-section values clustered between approximately 640 and

780 mb

. The re-evaluated dataset appears to reduce or eliminate the influence of earlier outliers. Consequently, the statistical measures are in close agreement. The weighted average is

728 mb

, the arithmetic mean is

718 mb

, and the MFV is

709 mb

. The improved alignment among these values indicates that the dataset is more internally consistent and statistically robust.

A comparison of central tendency measures between the original and re-evaluated datasets reveals substantial differences, particularly in how each estimator responded to outliers. The arithmetic mean increased from

638 mb

in Figure 1 to

718 mb

in Figure 2, representing a percentage difference of approximately 12.5%. In contrast, the weighted mean underwent a dramatic increase from

399 mb

to

728 mb

, corresponding to an 82.5% difference. This large change indicates that the original dataset contained low-valued outliers that significantly influenced the weighted average, and their impact was effectively mitigated in the updated dataset. The MFV showed only a small change, rising from

685 mb

to

709 mb

, with a percent difference of about 3.5%. The relatively small change observed in the MFV between the two datasets suggests that this estimator is more forgiving with respect to data quality, exhibiting robustness against the presence of outliers or inconsistencies that significantly affect other estimators, such as the mean or weighted average.

This comparison highlights the value of re-evaluating nuclear data using updated nuclear parameters. This study also demonstrates the importance of selecting appropriate statistical estimators—especially the MFV—for achieving reliable analysis. By correcting problematic measurements, the updated dataset shown in Figure 2 allowed for a more accurate estimation of central tendency and supported stronger confidence in the subsequent statistical analysis.

Furthermore, we aimed to evaluate the concept of calculating confidence intervals using a robust MFV estimator within the HPB framework. The HPB method explicitly incorporates the uncertainty associated with each individual data point. Our goal was to evaluate how sensitive the estimated confidence intervals were to the presence of outliers in the dataset. This comparison helped us to assess the robustness of different estimation approaches under different data quality conditions.

To do this, we estimated the confidence intervals using the HPB method based on the data in Table 1 (see Figure 1). We then repeated the analysis using the re-evaluated data from Song et al. [61], shown in Table 2. We matched rows between the two tables using neutron energy, original cross-sectional values, and associated uncertainties. The matching entries were then updated with the re-evaluated values. This ensured that the final dataset shown in Figure 2 incorporated the most current and accurate nuclear information, providing a solid foundation for robust confidence interval estimation.

The HPB method, originally proposed in [48] and fully described in [19], has been successfully applied in nuclear safety evaluations and environmental monitoring [49], where data quality varies and outliers are common. In this case, its application is particularly appropriate given the limited dataset and the presence of variable uncertainty levels.

4. Discussion

To validate the correctness and consistency of the randomized bootstrap sample values used in the MFV-HPB statistical framework, we analyzed a subset of four cross-section entries from Table 1, specifically,

x_{1}

,

x_{2}

,

x_{3}

, and

x_{4}

. These values were randomly sampled using a Gaussian distribution based on their reported central values and uncertainties. Figure 3 presents the histograms of these samples along with fitted Gaussian curves.

To evaluate how closely the randomized bootstrap samples matched the original measurements, each histogram was fitted with a Gaussian distribution. The resulting mean (

μ

) and standard deviation (

σ

) from the fit were then compared to the corresponding reference values from Table 1. For each case, the absolute percent difference was calculated to quantify the deviation between the fitted and tabulated values:

Δ μ = |\frac{μ_{fit} - μ_{ref}}{μ_{ref}}| \times 100 %, Δ σ = |\frac{σ_{fit} - σ_{ref}}{σ_{ref}}| \times 100 %

Here,

μ_{fit}

and

σ_{fit}

are the parameters obtained from the fit, while

μ_{ref}

and

σ_{ref}

are the original values from the table. These percent differences are shown on each panel of Figure 3 and provide a simple measure of consistency between the fit results and the expected values. All differences were found to be small, indicating that the Gaussian randomization process preserved the statistical characteristics of the original data.

It is worth noting that, for the

x_{1}

dataset, the reported uncertainty in Table 1 is comparable in magnitude to the cross-section value itself. This presents a potential challenge when generating randomized bootstrap values using a Gaussian distribution, as the sampling process could yield negative values. Since negative cross-sections have no physical meaning, any such values encountered during the bootstrap process were discarded, and a new random sample was drawn in their place to preserve physical consistency. This correction mechanism can slightly distort the expected shape of the resulting distribution. Consequently, the fitted Gaussian parameters for

x_{1}

show a modest deviation from the reference value. However, this discrepancy remains much smaller than the associated uncertainty, and, as such, does not compromise the reliability or conclusions drawn from the MFV-HPB analysis.

Overall, the excellent agreement between the Gaussian fits and the original cross-section values affirms that the randomized samples preserve the statistical characteristics of the source data. This outcome confirms the reliability of the bootstrapped datasets as valid inputs for robust estimation using the MFV-HPB framework.

Figure 4 illustrates the distribution of MFV estimates for the ¹⁰⁹Ag(n, 2n)^108mAg reaction using the original fast-neutron cross-section dataset from Table 1. These estimates were generated using HPB, a method that incorporates both measurement uncertainties and sampling variability [49].

The x-axis shows the MFV values obtained from individual bootstrap samples, expressed in millibarns (mb), while the y-axis represents the normalized density. The distribution is centered around the original dataset’s MFV of

685 mb

, as indicated by a solid vertical line. The 68.27% confidence interval (marked by dashed blue lines) ranges from approximately

661 mb

to

715 mb

. Although not shown in the figure, a broader 2-sigma interval was also calculated to extend from about

633 mb

to

745 mb

. The intervals were determined using the percentile method.

To ensure statistical reliability, 500,000 bootstrap replicates were generated. This large sample size helped us to reduce the influence of random fluctuations and improved the precision of the resulting confidence intervals. Additionally, only physically meaningful, positive-valued cross-sectional data were used in the analysis, especially in light of the large uncertainties associated with some older measurements, such as the 1969 value of

670 \pm 266 mb

[50].

The resulting histogram is narrow and symmetric with minimal skew. This confirms that the MFV is a stable and reliable central estimate even when applied to a dataset that includes variable uncertainties and possible outliers. The narrow confidence interval further demonstrates that the combined MFV and HPB approach [48] provides a robust framework for summarizing central tendency and uncertainty in nuclear cross-sectional data.

Figure 5 presents a similar analysis applied to the re-evaluated dataset with the updated cross-sectional values from Table 2. This dataset incorporates updated nuclear information, including a revised half-life for ^108mAg, and yields an MFV centered at approximately

709 mb

. The histogram again displays the distribution of MFV values derived from HPB resampling.

The shape of the distribution in Figure 5 is slightly right skewed, with the 68.27% confidence interval spanning from

691 mb

to

744 mb

. The corresponding 2-sigma interval ranges from

666 mb

to

774 mb

. Compared with the original data in Figure 4, the revised dataset produced slightly higher MFV estimates while maintaining a similar spread. This shift reflects a higher concentration of consistent values in the upper cross-section range, as seen in the re-evaluated measurements (see Figure 2).

The stability of the MFV across both the original and re-evaluated datasets underscores the strength of the proposed method. While conventional metrics like the mean or weighted average are easily distorted by extreme values or skewed uncertainties, the MFV remains centered within the most densely populated region of the data. By combining the MFV with HPB, we can generate confidence intervals that are not only statistically rigorous but also not sensitive to the quality and distribution of the original measurements.

These findings highlight the advantages of MFV-based analysis of nuclear data, particularly in the presence of heterogeneous uncertainty and limited experimental repetition. The proposed method reduces the bias introduced by outliers and leverages resampling to quantify uncertainty without assuming a specific underlying distribution. Although computationally intensive, the MFV and HPB approaches are well suited for nuclear datasets where precision matters and data collection is costly.

Together, Figure 4 and Figure 5 demonstrate that combining the MFV estimator with hybrid bootstrapping results in a consistent and interpretable analysis framework. This methodology offers a promising direction for future nuclear data evaluation efforts, especially when dealing with incomplete, conflicting, or high-uncertainty datasets.

For datasets containing more than 10 elements, the use of non-parametric bootstrapping is generally considered appropriate for estimating statistical confidence intervals [67]. Singh et al. [68] recommend this guideline. In this study, we used a dataset of 31 values, which would normally allow for such non-parametric bootstrapping. However, we chose the hybrid parametric bootstrap method instead. Unlike traditional resampling, HPB accounts for the measurement uncertainty of each data point and incorporates it directly into the resampling process. As a result, it produces confidence intervals that are more representative of real-world conditions. In applications such as nuclear cross-section calculations, it is also essential to ensure that generated values remain physically valid. For example, cross-sections must be strictly positive. The HPB method allows such constraints to be imposed as part of its core procedure.

To evaluate the robustness of the MFV-HPB method when applied to small datasets, we analyzed five published half-life values for ^108mAg, each reporting a central estimate and associated uncertainty. These values span nearly five decades of research, ranging from Vonach et al.’s 1969 result to the latest measurements in 2018. The earliest estimate,

310 \pm 132

years [50], relied on neutron activation and cross-section-based modeling, whereas subsequent values were derived from decay spectroscopy with progressively improved precision.

Using all five values, the MFV analysis produced a half-life of

433.5 years

, with a 68.27% confidence interval (1-sigma) of

[299.2, 440.9] years

and a 95.45% interval (2-sigma) of

[124.4, 456.7] years

. These broad intervals reflect both the limited sample size and the large uncertainty associated with the 1969 measurement.

To test whether the earlier, less precise estimate significantly influenced the result, we repeated the MFV-HPB analysis using only the three most recent and precise values from 1992, 2004, and 2018. This yielded an MFV of

439.1 years

with narrower confidence intervals:

[420.3, 445.3] years

for 1-sigma and

[404.1, 466.5] years

for 2-sigma.

The difference between the two MFV results was modest—less than

6 years

—and well within the respective uncertainty ranges. Notably, the second analysis excluded not only the earliest measurement by Vonach et al. [50] (

310 \pm 132 years

), but also the 1970 value of

127 \pm 7 years

[63], which is significantly lower than later estimates and based on earlier-generation measurement techniques. Despite removing these two lower and more uncertain values, the resulting MFV changed only slightly.

The bootstrap values for the hybrid parametric bootstrap analysis, focusing on the three latest half-life measurements of ^108mAg, are shown in Figure 6. A summary of these values is also provided in Table 3. Each colored distribution in Figure 6 corresponds to the set of simulated half-life values for one of the original measurements:

418 \pm 15

years (HL3_1),

437.7 \pm 7.7

years (HL3_2), and

448 \pm 27

years (HL3_3). For each dataset, 100,000 bootstrap replicates were generated by randomly sampling according to the reported measurement uncertainties, assuming Gaussian distributions. The gray histogram in the background shows the combined distribution of all simulated values. The plotted histograms illustrate the spread and central tendency of the simulated half-lives, with clear separation between the peaks corresponding to each original measurement.

Table 3 provides a quantitative comparison between the fitted Gaussian parameters extracted from the simulated datasets and the original measured values. For each origin, the mean (

μ

) and standard deviation (

σ

) obtained from the bootstrap simulations are compared to the original published values. The absolute percent differences (

Δ μ

and

Δ σ

) are extremely small, all below

0.5 %

, indicating excellent agreement between the simulations and the experimental data. This validates that the bootstrap sampling process accurately preserved the statistical properties of the original measurements without introducing additional bias or systematic deviation.

The close agreement between the MFV values derived from the five-element and three-element datasets demonstrates that the central estimate remains stable even when early, high-uncertainty, or outlier data are excluded. It further underscores the MFV-HPB method’s ability to yield reliable central values from small, heterogeneous, and historically diverse datasets. This reinforces its usefulness in validating or reinterpreting legacy nuclear measurements using modern statistical approaches.

In both cases, the confidence intervals were estimated using the percentile method, which extracts quantiles from the empirical bootstrap distribution. To improve statistical accuracy and reduce variability, the MFV was computed for each of 100,000 bootstrap samples. This large number of resamples ensured precise estimation of the confidence bounds even with limited source data and considered the uncertainty associated with each element in the dataset (see Table 3). In addition, a large bootstrap sample size was selected to minimize the introduction of additional statistical errors due to the inherent randomness of the HPB technique. To validate this, we performed a Gaussian fit for each individual data point generated during resampling and confirmed that both the mean and standard deviation of the fitted distribution matched the original source values. This step confirmed that the HPB procedure truly preserved the original data structure and uncertainty, thereby ensuring the reliability of the resulting MFV-based confidence intervals.

An important observation from this analysis is that the confidence intervals derived using the MFV-HPB method were not necessarily symmetric around the central value. This contrasts with intervals based on the arithmetic mean, which typically assume normally distributed data and produce symmetric intervals. The asymmetry observed in the MFV-HPB results more accurately reflects the underlying distribution of the data and highlights the method’s suitability for non-Gaussian datasets.

Despite its strengths, the MFV-HPB framework is not without limitations. Like other statistical methods, it performs best when most data are concentrated near the true value. If erroneous or biased values overwhelm a dataset, even the MFV-HPB approach can yield misleading results. This limitation is not unique to MFV; it represents a general challenge in statistical inference that underscores the importance of high-quality data.

In addition, the HPB method is computationally intensive, especially when large numbers of bootstrap samples are required or when applied to larger datasets. Nonetheless, this approach remains highly effective in situations in which measurement uncertainty is significant and traditional statistical assumptions do not hold. The flexibility and reliability of the proposed method make it a valuable tool not only in nuclear data analysis but also in other fields, such as biophysics and medical diagnostics, where datasets are often small, noisy, and irregularly distributed.

The MFV-HPB framework is a method developed to provide strong estimates of central values and their confidence intervals in datasets with variability and potential outliers. This framework faces specific challenges when dealing with datasets with multimodal distributions, which are common in areas like molecular dynamics and the microscale mechanical mapping of cancer cells, where bimodal or multimodal systems often occur [69,70]. The MFV-HPB procedure can be performed in a “mode-by-mode” approach. By isolating each peak and recalculating the MFV with a confidence interval using a hybrid bootstrap method, this approach usually offers more precise confidence intervals for each mode. The MFV step reduces the impact of extreme data points within each subgroup, leading to MFV-HPB confidence intervals that are systematically narrower than those from traditional methods, thus providing more insightful summaries of multimodal datasets.

In this study, the dataset represents a single physical parameter: the cross-section value of a particular nuclear reaction at a specific neutron energy. Ideally, the data should converge to a value that accurately reflects this parameter. However, multiple peaks or modes can appear in the data. These multimodal features often result from differences in methods, inconsistent measurement techniques, or hidden systematic errors rather than actual physical variations.

In the analysis of neutron lifetime data [25], different experimental methods produced several groups of values. Although these groups varied, there should be one true central value for the neutron lifetime because it is a fundamental physical constant. The MFV approach helped us to find a reliable central value that considered all data points and reduced the effects of clusters caused by specific measurement techniques.

Similarly, this study initially observed a bimodal pattern in the cross-section data for the reaction ¹⁰⁹Ag(n, 2n)^108mAg (see Figure 1). This pattern mainly arose from differences in the reported half-life of ^108mAg. Some values relied on older half-life estimates, whereas others used updated measurements. Adjusting the dataset to correct these systematic differences reduced the bimodal effect (see Figure 2), supporting the conclusion that a single value can accurately describe the cross-section.

The MFV-HPB framework considers the uncertainties of each data point, thereby providing a more complete estimate of the central value and its confidence intervals. In contrast, PDG uses a weighted mean method, which also considers uncertainties but tends to bias the central value toward data points with the smallest uncertainties [19]. This bias can become an issue when the smallest uncertainty does not truly reflect better measurement accuracy but instead arises from unrecognized systematic errors or possible mistakes in data reporting [24].

5. Conclusions

In this study, we presented a robust statistical approach that combines Steiner’s most frequent value method with a hybrid bootstrap procedure to estimate central tendency and confidence intervals for datasets affected by variability, outliers, or limited sample size. We applied this method to fast-neutron activation cross-section measurements of the ¹⁰⁹Ag(n, 2n)^108mAg reaction at

14.7 \pm 0.2 MeV

, a dataset characterized by significant uncertainties and inconsistencies.

Our findings show that the MFV provides a stable and reliable central estimate, even when traditional metrics like the arithmetic or weighted mean are distorted by outliers. Combined with the hybrid bootstrap procedure, which models both measurement uncertainty and sampling variability, the method produced interpretable and well-constrained confidence intervals. For the cross-section data, the MFV was determined to be

709 mb

, with a 68.27% confidence interval of

[691, 744] mb

and a 95.45% confidence interval of

[666, 774] mb

, based on 500,000 hybrid bootstrap replicates.

To further demonstrate the method’s flexibility, we applied the MFV-HPB framework to estimate the half-life of the ^108mAg isotope using a small dataset of published measurements. When analyzing five available values (excluding the earliest lower-limit estimate from 1960), the MFV was found to be

433.5 years

, with a 68.27% confidence interval of

[300.6, 441.1] years

and a 95.45% interval of

[124.4, 456.6] years

. A secondary analysis, considering only the three most recent and precise measurements, yielded an MFV of

439.1 years

with narrower confidence intervals. The close agreement between these results confirmed that the method maintains stability even when earlier, higher-uncertainty data are excluded.

Importantly, the MFV-HPB method does not assume any particular shape for the underlying data distribution. As observed in the half-life analysis, the resulting confidence intervals can be asymmetric, providing a more accurate reflection of the real uncertainty structure compared to traditional Gaussian-based methods.

The hybrid bootstrap method used in this study combines non-parametric resampling of the original data entries with parametric simulation based on their individual uncertainties. Unlike a purely parametric bootstrap, where each original data point contributes exactly one simulated value per replicate, our approach first resamples the data points with replacement. This adds an extra layer of variability, capturing uncertainty about which measurements are most representative. Especially in small datasets or when uncertainties are heterogeneous—as in early half-life measurements—this strategy produces slightly wider and more conservative confidence intervals, better representing the true uncertainty.

Although this work focused on nuclear physics examples, the statistical challenges addressed here—such as data sparsity, measurement uncertainty, and non-Gaussian behavior—are common across many fields, including biomolecular research, environmental monitoring, and diagnostics. As a result, the MFV-HPB framework provides a generalizable and reliable tool for extracting trustworthy insights from real-world data.

We hope that this study will encourage broader adoption of the MFV-HPB methodology in scientific data analysis, particularly in situations where robustness and interpretability are critical. Future efforts could expand its application to additional scientific domains and help formalize its integration into standard evaluation practices. Ultimately, the MFV-HPB approach offers a statistically rigorous and adaptable pathway for addressing uncertainty in complex datasets.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This manuscript’s associated data are available in the following repository: https://osf.io/g2h3m/ [71] (accessed on 7 May 2025). To validate the reliability of the method, the author used published neutron lifetime data as a benchmark (included in the repository). This confirmed that the MFV estimate was consistent with the findings of the original study [25].

Acknowledgments

I would like to sincerely thank Maria Filimonova for her invaluable support and assistance throughout this work. I am also grateful to the management and staff at Canadian Nuclear Laboratories for providing a supportive environment for this study, with special thanks to Genevieve Hamilton and David Yuke. I also appreciate the thoughtful comments and suggestions from the anonymous reviewers, which have helped improve the quality of this paper. During the preparation of this work, the author used ChatGPT (version: 4o-mini) to check the language of the manuscript. After using this tool, the author reviewed and edited the content as needed and takes full responsibility for the content of the publication.

Conflicts of Interest

The author declares that they have no conflicts of interest.

Abbreviations

The following abbreviations were used in this manuscript:

CI	Confidence interval
CRL	Chalk River Laboratories
HPB	Hybrid parametric bootstrap
MeV	Mega electron volt
MFV	Most frequent value
KL	Kullback–Leibler

References

Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
Ahmadraji, T.; Gonzalez-Macia, L.; Ritvonen, T.; Willert, A.; Ylimaula, S.; Donaghy, D.; Tuurala, S.; Suhonen, M.; Smart, D.; Morrin, A.; et al. Biomedical Diagnostics Enabled by Integrated Organic and Printed Electronics. Anal. Chem. 2017, 89, 7447–7454. [Google Scholar] [CrossRef] [PubMed]
Ladisch, M.R.; Cooney, C.L.; Dean, R.C., Jr.; Humphrey, A.E.; Kirk, T.K.; McIntire, L.V.; Michaels, A.S.; Myers-Keith, P.; Ryu, D.D.Y.; Swartz, J.R.; et al. Putting Biotechnology to Work: Bioprocess Engineering; National Academies Press: Washington, DC, USA, 1992. [Google Scholar]
De Torrenté, L.; Zimmerman, S.; Suzuki, M.; Christopeit, M.; Greally, J.M.; Mar, J.C. The Shape of Gene Expression Distributions Matter: How Incorporating Distribution Shape Improves the Interpretation of Cancer Transcriptomic Data. BMC Bioinform. 2020, 21, 562. [Google Scholar] [CrossRef]
Huang, S.T.; Lederer, J. DeepMoM: Robust Deep Learning with Median-of-Means. J. Comput. Graph. Stat. 2023, 32, 181–195. [Google Scholar] [CrossRef]
Evans, C.; Hardin, J.; Stoebel, D.M. Selecting Between-Sample RNA-Seq Normalization Methods from the Perspective of Their Assumptions. Briefings Bioinform. 2018, 19, 776–792. [Google Scholar] [CrossRef]
Idrees, Z.; Zheng, L. Low Cost Air Pollution Monitoring Systems: A Review of Protocols and Enabling Technologies. J. Ind. Inf. Integr. 2020, 17, 100123. [Google Scholar] [CrossRef]
Bernasconi, S.; Angelucci, A.; Aliverti, A. A Scoping Review on Wearable Devices for Environmental Monitoring and Their Application for Health and Wellness. Sensors 2022, 22, 5994. [Google Scholar] [CrossRef]
Hutchinson, N. Understanding and Controlling Sources of Process Variation: Risks to Achieving Product Critical Quality Attributes. Bioprocess Int. 2014, 12, 24–29. [Google Scholar]
McDonnell, S.; Principe, R.F.; Zamprognio, M.S.; Whelan, J. Challenges and Emerging Technologies in Biomanufacturing of Monoclonal Antibodies (mAbs). In Biotechnology; Villarreal-Gómez, L.J., Ed.; IntechOpen: Rijeka, Croatia, 2022; Chapter 8. [Google Scholar] [CrossRef]
Nicholls, A. Confidence Limits, Error Bars and Method Comparison in Molecular Modeling. Part 1: The Calculation of Confidence Intervals. J.-Comput.-Aided Mol. Des. 2014, 28, 887–918. [Google Scholar] [CrossRef]
Baek, S.U.; Yoon, J.H. High-Sensitivity c-Reactive Protein Levels in Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD), Metabolic Alcohol-Associated Liver Disease (MetALD), and Alcoholic Liver Disease (ALD) with Metabolic Dysfunction. Biomolecules 2024, 14, 1468. [Google Scholar] [CrossRef]
Csernyák, L.; Steiner, F. Practical Computation of the Most Frequent Value of Data Systems. Acta Geod. Geophys. Montan. Acad. Sci. Hung. 1980, 15, 59–73. [Google Scholar]
Steiner, F. M-Fitting (Fitting According to the Most Frequent Value) and Its Comparison with the Method of Least Squares. Acta Geod. Geophys. Montan. Acad. Sci. Hung. 1980, 15, 37–44. [Google Scholar]
Csernyák, L.; Hajagos, B.; Steiner, F. General Validity of the Law of Large Numbers in Case of Adjustments According to the Most Frequent Value. Acta Geod. Geophys. Montan. Acad. Sci. Hung. 1981, 16, 73–90. [Google Scholar]
Hajagos, B. Der Häufigste Wert, Als Eine Abschätzung von Minimalem Informationsverlust Etc. Publ. Tech. Univ. Heavy Ind. Ser. A Min. 1982, 37, 95–114. [Google Scholar]
Steiner, F. Most Frequent Value Procedures (a Short Monograph). Geophys. Trans. 1988, 34, 139–260. [Google Scholar]
Golovko, V.V.; Kamaev, O.; Sun, J. Unveiling Insights: Harnessing the Power of the Most-Frequent-Value Method for Sensor Data Analysis. Sensors 2023, 23, 8856. [Google Scholar] [CrossRef]
Golovko, V. Improving Confidence Intervals and Central Value Estimation in Small Datasets through Hybrid Parametric Bootstrapping. Inf. Sci. 2025, 716, 122254. [Google Scholar] [CrossRef]
Steiner, F. Most Frequent Value and Cohesion of Probability Distributions. Acta Geod. Geophys. Montan. Acad. Sci. Hung. 1973, 8, 381–395. [Google Scholar]
Zyla, P.A.; Barnett, R.M.; Beringer, J.; Dahl, O.; Gerber, H.J.; Gershon, T.; Gershtein, Y.; Gherghetta, T.; Godizov, A.A.; Gonzalez-Garcia, M.C.; et al. Review of Particle Physics. Prog. Theor. Exp. Phys. 2020, 2020, 083C01. [Google Scholar] [CrossRef]
Tolner, F.; Barta, B.; Eigner, G. Outlier Identification with MFV-robustified Linear Regression in Case of Economic Convergence of EU NUTS Regions. Acta Polytech. Hung. 2024, 21, 47–66. [Google Scholar] [CrossRef]
Zhang, J.; Li, L.; Su, H.; Chen, Y.; Shi, W. Most Frequent Value Analysis of Distance Measurements to M87. Mon. Not. R. Astron. Soc. 2024, 533, 2916–2926. [Google Scholar] [CrossRef]
Golovko, V.V. Application of the Most Frequent Value Method for 39Ar Half-Life Determination. Eur. Phys. J. C 2023, 83, 930. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, S.; Zhang, Z.R.; Zhang, P.; Li, W.; Hong, Y. MFV Approach to Robust Estimate of Neutron Lifetime. Eur. Phys. J. C 2022, 82, 1106. [Google Scholar] [CrossRef]
Szabó, N.P.; Balogh, G.P.; Stickel, J. Most Frequent Value-Based Factor Analysis of Direct-Push Logging Data. Geophys. Prospect. 2018, 66, 530–548. [Google Scholar] [CrossRef]
Zhang, J. Most Frequent Value Statistics and the Hubble Constant. Publ. Astron. Soc. Pac. 2018, 130, 084502. [Google Scholar] [CrossRef]
Tolner, F.; Fegyverneki, S.; Barta, B.; Eigner, G. Robust Clustering Based on the Most Frequent Value Method. Multidiszcip. Tudományok 2023, 13, 141–153. [Google Scholar] [CrossRef]
Zhang, J. Most Frequent Value Statistics and Distribution of 7Li Abundance Observations. Mon. Not. R. Astron. Soc. 2017, 468, 5014–5019. [Google Scholar] [CrossRef]
Golovko, V.V. Simplified Efficiency Calibration Methods for Scintillation Detectors Used in Nuclear Remediation. J. Clean. Prod. 2024, 478, 143910. [Google Scholar] [CrossRef]
Steiner, F. The Most Frequent Value. Introduction to a Modern Conception of Statistics; Akadémiai Kiadó: Budapest, Hungary, 1991. [Google Scholar]
Steiner, F. Optimum Methods in Statistics; Akadémiai Kiadó: Budapest, Hungary, 1997. [Google Scholar]
Cauchy, A.L. Sur Les Résultats Moyens d’observations de Même Nature, et Sur Les Résultats Les plus Probables. Comptes Rendus de l’Académie des Sci. de Paris 1853, 37, 198–206. [Google Scholar]
Lorentz, H. The Absorption and Emission Lines of Gaseous Bodies. Proc. R. Neth. Acad. Arts Sci. 1906, 8, 591–611. [Google Scholar]
Breit, G.; Wigner, E. Capture of Slow Neutrons. Phys. Rev. 1936, 49, 519–531. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Walck, C. Hand-Book on Statistical Distributions for Experimentalists, University of Stockholm; Internal Report SUF–PFY/96–01; University of Stockholm: Stockholm, Sweden, 2007. [Google Scholar]
Csernyák, L. Bemerkung Zum Artikel ’Der Häufigste Wert, Als Eine Abschätzung von Minimalem Informationsverlust Etc.’ von B. Hajagos. Publ. Tech. Univ. Heavy Ind. Ser. A Min. 1982, 37, 115–118. [Google Scholar]
Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
Verdú, S. The Cauchy Distribution in Information Theory. Entropy 2023, 25, 346. [Google Scholar] [CrossRef]
Csiszar, I. I-Divergence Geometry of Probability Distributions and Minimization Problems. Ann. Probab. 1975, 3, 146–158. [Google Scholar] [CrossRef]
Hajagos, B.; Steiner, F. MFV-filtering to Suppress Errors—A Comparison with Median Filters. Acta Geod. Geoph. Mont. Hung. 1992, 27, 185–194. [Google Scholar]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA; London, UK; New York, NY, USA; Washington, DC, USA, 1994. [Google Scholar]
Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Applications; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Puth, M.T.; Neuhäuser, M.; Ruxton, G.D. On the Variety of Methods for Calculating Confidence Intervals by Bootstrapping. J. Anim. Ecol. 2015, 84, 892–897. [Google Scholar] [CrossRef]
Mokhtar, S.F.; Yusof, Z.M.; Sapiri, H. Confidence Intervals by Bootstrapping Approach: A Significance Review. Malays. J. Fundam. Appl. Sci. 2023, 19, 30–42. [Google Scholar] [CrossRef]
Kostanek, J.; Karolczak, K.; Kuliczkowski, W.; Watala, C. Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation. Data 2024, 9, 95. [Google Scholar] [CrossRef]
Golovko, V.V. Estimation of Ru-97 Half-Life Using the Most Frequent Value Method and Bootstrapping Techniques. arXiv 2024, arXiv:2410.19988. [Google Scholar] [CrossRef]
Golovko, V.V. Optimizing Sensor Data Interpretation via Hybrid Parametric Bootstrapping. Sensors 2025, 25, 1183. [Google Scholar] [CrossRef] [PubMed]
Vonach, H.; Hille, M.; Hille, P. Destination of the Half-Life of ^108mAg by Means of (n, 2n)-Cross-Sections of Silver. Z. Phys. A Hadron. Nucl. 1969, 227, 381–390. [Google Scholar] [CrossRef]
Csikai, J.; Buczkó, C.M.; Pepelnik, R.; Agrawal, H. Activation Cross-Sections Related to Nuclear Heating of High Tc Superconductors. Ann. Nucl. Energy 1991, 18, 1–4. [Google Scholar] [CrossRef]
Lu, H.; Yu, W.; Zhao, W. Research of Activation Cross Sections for Long-Lived Radionuclides on Elements of Cu, Mo, Ag, Eu and Tb; Technical Report INDC(NDS)–232/L; IAEA Nuclear Data Section: Vienna, Austria, 1990. [Google Scholar]
Wang, Y.; Yuan, J.; Yang, J.; Wang, H.; Shui, Y.; Ren, Z. Cross-Section Measurement for ¹⁰⁹Ag(n,2n)^108mAg Reaction. Nucl. Sci. Eng. 1992, 111, 314–316. [Google Scholar] [CrossRef]
Meadows, J.W.; Smith, D.L.; Greenwood, L.R.; Haight, R.C.; Ikeda, Y.; Konno, C. Measurement of Fast-Neutron Activation Cross Sections for Copper, Europium, Hafnium, Iron, Nickel, Silver, Terbium and Titanium at 10.0 and 14.7 MeV and for the Be (d, n) Thick-Target Spectrum. Ann. Nucl. Energy 1996, 23, 877–899. [Google Scholar] [CrossRef]
Qaim, S.; Cserpák, F.; Csikai, J. Excitation Functions of ¹⁰⁹Ag(n,2n)^108mAg, ¹⁵¹Eu(n,2n)^150mEu and ¹⁵⁹Tb(n,2n)¹⁵⁸Tb Reactions from Threshold to 15 MeV. Appl. Radiat. Isot. 1996, 47, 569–573. [Google Scholar] [CrossRef]
Ikeda, Y.; Konno, C.; Kumar, A.; Kasugai, Y. Summary of Activation Cross Sections Measurements at Fusion Neutron Source in JAERI; Nuclear Data Section INDC(NDS)-342; International Atomic Energy Agency: Vienna, Austria, 1996. [Google Scholar]
Csikai, J. Measured, Estimated and Calculated Cross Sections for the Generation of Long-Lived Radionuclides in Fast Neutron Induced Reactions; INDC-342, IAEA: Vienna, Austria, 1996. [Google Scholar]
Pashchenko, A.B. Report of the 2nd International Atomic Energy Agency Research Coordination Mtg. Activation Cross Section for the Generation of Long-Lived Radionuclides of Importance in Fusion Reactor Technology, St. Petersburg, Russia, 19–23 June 1995. Available online: https://www-nds.iaea.org/publications/indc/indc-nds-0344.pdf (accessed on 7 May 2025).
Luo, J.; Tuo, F.; Kong, X.; Liu, R.; Jiang, L. Activation Cross-Section for Reactions Induced by 14MeV Neutrons on Natural Silver. Ann. Nucl. Energy 2009, 36, 718–722. [Google Scholar] [CrossRef]
Filatenkov, A.A. Neutron Activation Cross Sections Measured at KRI in Neutron Energy Region 13.4–14.9 MeV, Technical Report 460; IAEA: St. Petersburg, Russia; Vienna, Austria, 2016.
Song, Y.; Zhou, F.; Hao, Y.; Zhang, X.; Ji, P.; Li, Y. Measurement of Fast-Neutron Activation Cross Section of the ¹⁰⁹Ag(n,2n)^108mAg Reaction and Its Theoretical Calculation of Excitation Function. Appl. Radiat. Isot. 2024, 203, 111111. [Google Scholar] [CrossRef]
Wahlgren, M.A.; Meinke, W.W. Isomerism of Silver-108. Phys. Rev. 1960, 118, 181–183. [Google Scholar] [CrossRef]
Habbottle, G. The Half-Lives of Two Long-Lived Nuclear Isomers, ^108mAg and ^192m2Ir, and of ¹³⁷cs and ²⁰⁴tl. Radiochim. Acta 1970, 13, 132–134. [Google Scholar] [CrossRef]
Schötzig, U.; Schrader, H.; Debertin, K. Precision Measurements of Radioactive Decay Data. In Nuclear Data for Science and Technology; Qaim, S.M., Ed.; Springer: Berlin/Heidelberg, Germany, 1992; pp. 562–564. [Google Scholar] [CrossRef]
Schrader, H. Half-Life Measurements with Ionization Chambers—A Study of Systematic Effects and Results. Appl. Radiat. Isot. 2004, 60, 317–323. [Google Scholar] [CrossRef] [PubMed]
Shugart, H.A.; Browne, E.; Norman, E.B. Half-Lives of ¹⁰¹Rh^g and ¹⁰⁸Ag^m. Appl. Radiat. Isot. 2018, 136, 101–103. [Google Scholar] [CrossRef]
Golovko, V.V. Smart Cleanup: Using Advanced Statistics for Safer Environmental Remediation. In Proceedings of the WM2025 Conference, Phoenix, AZ, USA, 9–13 March 2025. [Google Scholar]
Singh, A.; Maichle, R.; Lee, S.E. On the Computation of a 95% Upper Confidence Limit of the Unknown Population Mean Based upon Data Sets with Below Detection Limit Observations; EPA Document EPA/600/R-06/022; U.S. Environmental Protection Agency Office of Research and Development: Washington, DC, USA, 2006.
Nikolić, M.; Scarcelli, G.; Tanner, K. Multimodal Microscale Mechanical Mapping of Cancer Cells in Complex Microenvironments. Biophys. J. 2022, 121, 3586–3599. [Google Scholar] [CrossRef]
Wu, M.; Liu, L.; Chan, C. Identification of Novel Targets for Breast Cancer by Exploring Gene Switches on a Genome Scale. BMC Genom. 2011, 12, 547. [Google Scholar] [CrossRef]
Golovko, V.V. Supporting Dataset for “Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets”. 2025. Available online: https://osf.io/g2h3m/ (accessed on 7 May 2025).

Figure 1. A histogram of fast-neutron (14.7 ± 0.2 MeV) activation cross-sections (see Table 1) of the ¹⁰⁹Ag(n, 2n)^108mAg showcasing the weighted average (399 mb), arithmetic mean (638 mb), and MFV (685 mb) as measures of central tendency. In addition, the smooth red curve shows the Gaussian fit of the data.

Figure 2. A histogram of fast-neutron (14.7 ± 0.2 MeV) activation cross-sections (re-evaluated) for the ¹⁰⁹Ag(n, 2n)^108mAg reaction displaying the weighted average (728 mb), arithmetic mean (718 mb), and MFV (709 mb) as indicators of central tendency. The smooth red curve shows the Gaussian fit of the data.

Figure 3. Histograms of randomized bootstrap sample values for four selected cross-section measurements (

x_{1}

,

x_{2}

,

x_{3}

,

x_{4}

). Each histogram is fitted with a Gaussian function, and the original values with uncertainties from Table 1 are marked with vertical black lines. Fit results and absolute percent differences are indicated.

Figure 3. Histograms of randomized bootstrap sample values for four selected cross-section measurements (

x_{1}

,

x_{2}

,

x_{3}

,

x_{4}

). Each histogram is fitted with a Gaussian function, and the original values with uncertainties from Table 1 are marked with vertical black lines. Fit results and absolute percent differences are indicated.

Figure 4. A histogram of the MFV for fast-neutron (14.7 ± 0.2 MeV) activation cross-sections of the ¹⁰⁹Ag(n, 2n)^108mAg reaction from the data in Table 1 (original). Hybrid parametric bootstrapping for the 68.3% confidence interval and the MFV are also shown.

Figure 5. A histogram of the MFV for fast-neutron (14.7 ± 0.2 MeV) activation cross-sections of the ¹⁰⁹Ag(n, 2n)^108mAg reaction from the data in Table 2 (re-evaluated). Hybrid parametric bootstrapping for 68.27% confidence interval and the MFV are also shown.

Figure 6. Histogram showing the distribution of hybrid parametric bootstrap simulated half-life values for the three most recent measurements of the ^108mAg isotope. Each color represents one bootstrapped half-life dataset:

418 \pm 15

years (HL3_1),

437.7 \pm 7.7

years (HL3_2), and

448 \pm 27

years (HL3_3). The gray histogram in the background shows the combined distribution of all simulated values.

Figure 6. Histogram showing the distribution of hybrid parametric bootstrap simulated half-life values for the three most recent measurements of the ^108mAg isotope. Each color represents one bootstrapped half-life dataset:

418 \pm 15

years (HL3_1),

437.7 \pm 7.7

years (HL3_2), and

448 \pm 27

years (HL3_3). The gray histogram in the background shows the combined distribution of all simulated values.

Table 1. Summary of fast-neutron

E_{n} = 14.7 \pm 0.2 MeV

activation cross-section (

σ

) of the ¹⁰⁹Ag(n, 2n)^108mAg reaction.

Table 1. Summary of fast-neutron

E_{n} = 14.7 \pm 0.2 MeV

activation cross-section (

σ

) of the ¹⁰⁹Ag(n, 2n)^108mAg reaction.

$E_{n}$ (MeV)	$σ$ (mb)	Uncertainty (mb)	References	Year	Comments
14.70	670	266	[50]	1969	measured
14.50	220	12	[51]	1991	measured
14.50	263	20	[51]	1991	measured
14.83	767	24	[52]	1991	evaluation
14.60	232	8	[53]	1992	measured
14.80	236	7	[53]	1992	measured
14.70	628	42	[54]	1996	measured
14.70	682	49	[54]	1996	measured
14.50	697	60	[55]	1996	measured
14.50	621	29	[56]	1996	evaluation
14.80	648	31	[56]	1996	evaluation
14.80	721	20	[57]	1996	evaluation
14.50	695	40	[58]	1997	evaluation
14.90	709	41	[58]	1997	evaluation
14.50	643	30	[58]	1997	evaluation
14.80	671	31	[58]	1997	evaluation
14.70	651	44	[58]	1997	evaluation
14.70	706	51	[58]	1997	evaluation
14.50	677	82	[58]	1997	evaluation
14.50	716	44	[58]	1997	evaluation
14.77	784	25	[58]	1997	evaluation
14.83	795	25	[58]	1997	evaluation
14.60	790	27	[58]	1997	evaluation
14.80	805	24	[58]	1997	evaluation
14.80	800	55	[59]	2009	measured
14.50	727	41	[60]	2016	measured
14.80	747	42	[60]	2016	measured
14.80	650	43	[61]	2024	measured
14.80	602	40	[61]	2024	measured
14.80	601	37	[61]	2024	measured
14.80	616	23	[61]	2024	measured

Table 2. Re-evaluation of the fast-neutron cross-section

(σ_{2})

for ^108mAg at neutron energy

E_{n} = 14.7 \pm 0.2 MeV

, as conducted by Song et al. [61] based on previous measurements

(σ_{1})

.

Table 2. Re-evaluation of the fast-neutron cross-section

(σ_{2})

for ^108mAg at neutron energy

E_{n} = 14.7 \pm 0.2 MeV

, as conducted by Song et al. [61] based on previous measurements

(σ_{1})

.

$E_{n}$ (MeV)	$σ_{1}$ (mb)	$σ_{2}$ (mb)	Reference
14.70	$628 \pm 42$	$658 \pm 44$	[54]
14.70	$682 \pm 49$	$715 \pm 41$	[54]
14.50	$697 \pm 60$	$705 \pm 61$	[55]
14.50	$621 \pm 29$	$651 \pm 30$	[56]
14.80	$648 \pm 31$	$679 \pm 32$	[56]
14.80	$721 \pm 18$	$755 \pm 21$	[57]
14.60	$232 \pm 8$	$800 \pm 28$	[53]
14.80	$236 \pm 7$	$814 \pm 24$	[53]
14.50	$220 \pm 12$	$759 \pm 41$	[51]
14.50	$263 \pm 20$	$907 \pm 69$	[51]
14.83	$767 \pm 24$	$804 \pm 25$	[52]

Table 3. Comparison between the simulated Gaussian fit results and original bootstrapped half-life values for the three datasets used in the bootstrap analysis. Small absolute percent differences indicate excellent agreement between the simulation and original data.

Origin	N	$μ$ (sim.)	$μ$ (orig.)	$σ$ (sim.)	$σ$ (orig.)	% Difference (%)
HL3_1	99,693	417.94	418.0	14.99	15.0	$Δ μ = 0.01$ , $Δ σ = 0.10$
HL3_2	99,572	437.75	437.7	7.69	7.7	$Δ μ = 0.01$ , $Δ σ = 0.19$
HL3_3	100,735	448.05	448.0	27.11	27.0	$Δ μ = 0.01$ , $Δ σ = 0.41$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Golovko, V.V. Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data. Biomolecules 2025, 15, 704. https://doi.org/10.3390/biom15050704

AMA Style

Golovko VV. Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data. Biomolecules. 2025; 15(5):704. https://doi.org/10.3390/biom15050704

Chicago/Turabian Style

Golovko, Victor V. 2025. "Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data" Biomolecules 15, no. 5: 704. https://doi.org/10.3390/biom15050704

APA Style

Golovko, V. V. (2025). Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data. Biomolecules, 15(5), 704. https://doi.org/10.3390/biom15050704

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Method for Confidence Interval Estimation in Outlier-Prone Datasets: Application to Molecular and Biophysical Data

Abstract

1. Introduction

2. Methodology

2.1. The Most Frequent Value

2.2. Bootstrapping for Robust Confidence Interval Estimation

3. Description of the Methods and Results

4. Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI