Determining Univariate Equivalency of Additively Manufactured Parts

Lynch, Colin M.; Villalobos, Rene; Valadez Mesta, Brenda Leticia; Gomez Guillen, Cesar; Mireles, Jorge; Wicker, Ryan B.

doi:10.3390/jmmp10040134

Open AccessArticle

Determining Univariate Equivalency of Additively Manufactured Parts

by

Colin M. Lynch

^1,2,*

,

Rene Villalobos

^1,3

,

Brenda Leticia Valadez Mesta

⁴

,

Cesar Gomez Guillen

⁴,

Jorge Mireles

⁵

and

Ryan B. Wicker

^5,6,7

¹

Terra Integrated Solutions, Mesa, AZ 85202, USA

²

School of Complex Adaptive Systems, Arizona State University, Tempe, AZ 85281, USA

³

School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA

⁴

W.M. Keck Center for 3D Innovation, University of Texas at El Paso, El Paso, TX 79968, USA

⁵

Additive Manufacturing Education Partners, El Paso, TX 79912, USA

⁶

Department of Aerospace and Mechanical Engineering, University of Texas at El Paso, El Paso, TX 79968, USA

⁷

Trusted Metal, Evans, GA 30809, USA

^*

Author to whom correspondence should be addressed.

J. Manuf. Mater. Process. 2026, 10(4), 134; https://doi.org/10.3390/jmmp10040134

Submission received: 9 March 2026 / Revised: 8 April 2026 / Accepted: 13 April 2026 / Published: 17 April 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

Additive manufacturing (AM) requires process-comparison tools that remain practical when sample generation and testing are costly. We propose a univariate, nonparametric workflow for comparing a candidate AM process to a stable reference process by testing distributional equivalency for a single response variable. The method discretizes the reference distribution into empirical percentile-defined bins and combines this representation with a sequential sampling protocol designed to reduce unnecessary sampling when evidence for equivalency or non-equivalency becomes sufficient. Simulation studies were used to evaluate operating characteristics across experimental settings, and a validation case study based on geometric measurements of laser based powder bed fusion plate scans correctly classified a candidate process expected to be equivalent to the reference while identifying a non-equivalent process at the first sampling step. The workflow is most appropriate for low-sample, high-cost, or throughput-constrained settings, and is best viewed as a tool for process comparability, change control, calibration, and requalification support rather than as a standalone replacement for qualification standards. The full workflow is implemented in the open-source AMEquivalency package to support reproducible analysis.

Keywords:

equivalency; additive manufacturing; sequential sampling; process comparability; laser-based powder bed fusion; nonparametric statistics

1. Introduction

Additive manufacturing (AM) has been defined as “a process of joining materials to make objects from 3D model data, usually layer upon layer, as opposed to subtractive manufacturing methodologies” [1]. The goal of AM is to rapidly and efficiently fabricate intricately designed parts, which may be impossible to produce with traditional machining techniques [2] using a wide array of materials, including polymers, metals, and ceramics [3]. Compared to traditional manufacturing, AM facilitates the production of highly customized and complex parts [4]; however, these benefits often imply an increase in economic investment and production time [5,6,7] along with poor geometric accuracy [8], repeatability, and reproducibility [9]. Coupled with these facts, the lack of standardization of best practices has slowed the development of quality assurance and control in AM [10]. Multiple institutions have attempted to implement industrial criteria for AM, including the International Organization of Standardization, ASTM International, the National Aeronautics and Space Administration, and America Makes [11,12,13,14,15], but benchmarks across these guidelines vary. For example, standards such as AMS7003, AMS7032, ISO/ASTM 52930, ISO/ASTM and NASA-STD-6030 [16,17,18,19,20,21] can be used for a type of AM called laser-based powder bed fusion of metals (PBF-LB/M), but each requires different sets of measurements, even though there is overlap among the various guidelines.

Recent work has continued to emphasize that qualification and certification remain major barriers to the broader industrial deployment of metal additive manufacturing. In high-consequence applications, qualification still depends heavily on costly inspection, destructive testing, and part-specific evidence, while the connection between in situ process signals and final part quality remains incomplete [22,23]. At the same time, recent reviews of powder bed fusion quality assessment and metal–laser-AM monitoring have highlighted substantial progress in sensing, monitoring, and control, but also persistent gaps in standardization, comparability, and practical industrial implementation [24,25]. These challenges extend beyond sensor development alone: recent work on standardized evaluation of in situ process-monitoring systems underscores that the field still lacks broadly accepted frameworks for objectively comparing monitoring outputs and deciding whether a modified process remains acceptably aligned with a trusted baseline [26]. Together, these developments reinforce the need for practical, interpretable, and sample-efficient statistical methods for AM process comparability, requalification, and change control.

Some of these attempts at standardization can introduce other problems as well. The quality of parts printed by PBF-LB/M can also be assessed with the Metallic Materials Properties Development and Standardization (MMPDS) Handbook [27]. This handbook outlines statistical guidelines for first characterizing the distributions of various mechanical properties (ultimate tensile strength, elastic modulus, elongation, etc.) and then uses a traditional hypothesis testing framework to determine whether the property meets some minimal standard. This framework, however, does not consider the high cost of samples and smaller average batch sizes of PBF-LB/M. First, sample size requirements depend on the shape of the underlying distribution of the mechanical property being tested. The shape of the underlying distribution may not be known a priori, and if nonparametric methods are required, then sample size requirements can also be prohibitively large (up to 299 samples, [27]). These large sample sizes can make the statistics robust to outliers, but if sample sizes are kept small, these outliers can have a disproportionate effect on parameter estimates. Finally, MMPDS depends on using traditional statistics to determine whether mechanical properties meet the minimally desired level. Determining whether two processes produce equivalent results is critical for qualification [28], calibration [29], and validation [30], and since statistical tests can be used to determine whether summary statistics such as the mean differ between groups (such as a t-test), at first blush this looks like a logical framework. However, traditional statistical practices are not well equipped to determine whether two populations are equivalent [31,32].

Under the traditional hypothesis testing framework, we assume that the shape of the underlying distribution is known and we have some evidence of the value of the parameters defining that population, either from historical information, inference, or statistical inference based on samples from the population. Thus, hypotheses are usually made around these values to determine if a sample came from a population whose shape is defined by the hypothesized values. For instance a classical null hypothesis is that two populations are the same in terms of the values of the parameters defining that population; the alternative hypothesis is that they are different. This places the burden of proof on the wrong hypothesis, that there is a difference between the groups. Therefore, it is relatively easy to make the incorrect conclusion that two populations are the same when in fact they are not. This issue is magnified in studies where samples are expensive and are thus scarce. These studies lack statistical power [33], so populations will appear equivalent even when they are not.

Equivalence tests were developed to resolve these issues [30,34]. Equivalence tests work by defining an acceptance region (the indifference zone; [35]) for the differences in means between two or more groups [36]. In one formulation of these tests, if the confidence interval for the differences between the groups lies within this region, the two are considered equivalent. If it lies outside of the region, or the confidence interval is so wide that it encompasses both limits of the acceptance region (which would happen for small sample sizes), then the populations are considered to be non-equivalent [31].

Equivalence testing has seen extensive use in psychological research and other medical fields [29,36,37,38,39], but has seen little use in manufacturing or quality control [32,40]. This could be due to the fact that equivalency tests are still limited in their applicability. Fundamentally, the standard equivalence test only determines equivalency with respect to differences in a point estimate of a distribution (which is almost always the mean). Differences in other aspects of the distribution (variance, skew, etc.) only matter insofar as they influence the mean. Additionally, various methods of realizing these equivalency tests depend in part on the underlying distribution, and a mismatch of distribution and method can lead to biased results [31].

Equivalence tests are also incomplete. These tests are only formulated for single variables, but manufactured parts can differ in potentially dozens of different relevant dimensions [41], so equivalency with respect to a single dimension is not sufficient. These variables may also be best characterized by different distributions, such as normal, lognormal, and Weibull [42], so the corresponding statistical test should be agnostic to the type of distribution. Finally, equivalency tests are often presented in a vacuum, when in reality the assumptions of the test (especially independence of measurements) also need to be explicitly tested. These tests, then, should be presented in a larger framework where other features of the experiment (such as stability) are assessed in conjunction with equivalency.

Other AM qualification and comparability approaches generally fall into three broad categories: specification-based acceptance methods tied to particular material/property standards, classical hypothesis-testing approaches focused on differences in summary statistics, and process-monitoring or high-throughput screening workflows intended to accelerate parameter exploration [22]. Each has clear value, but none directly resolves the specific problem addressed here: determining whether a candidate process is sufficiently similar to a stable reference distribution for a single response variable while remaining robust to distributional form and economically feasible at small sample sizes. Specification-based approaches are often application-specific and may require large datasets or stronger assumptions than are practical in some qualification settings; mean-based comparisons can miss changes in variance or tail behavior; and high-throughput workflows, while powerful in LPBF and related contexts, do not eliminate the need for decision rules when throughput is limited or artifacts are expensive [43]. Accordingly, there remains a need for a lightweight equivalency framework that is nonparametric, explicit about stability assumptions, and compatible with sequential stopping.

Here, we define a process that minimizes the cost of an experiment by discretizing the distribution of the performance metric and by evaluating information as the experiment progresses, stopping when there is sufficient information to determine whether or not two processes are equivalent. Discretization limits the resolution of the distribution, so smaller sample sizes are needed to resolve it. Sequential sampling allows the user to stop the experiment before the maximal sample size is attained, and thus can save resources. By using a nonparametric approach, we minimize the influence of extreme observations and avoid committing to a parametric family. However, outliers can be diagnostically meaningful in AM: they may reflect transient process faults (e.g., recoater events, spatter, parameter drift) or unusually favorable outcomes. For this reason, we view formal outlier detection and root-cause investigation as a complementary secondary analysis. We perform simulations to validate our approach, estimating the prevalence of Type I and Type II error rates for different experimental protocols. We integrate control charts into the process to ensure that the variable being measured is stable. Finally, we perform validation experiments to ensure that this statistical method has real-world value. To facilitate adoption and reproducibility, we provide an accompanying R package (AMEquivalency, version 3.0.0.9000) that implements control-chart assessment, percentile binning, sequential sampling schedules, and the equivalency decision rule [44]. Figure 1 and Box 1 summarize the workflow and the key equations used to assess equivalency. An additional tutorial is available in the GitHub repository.

Box 1. Overview of the proposed equivalency framework, showing the recommended analysis parameters, governing equations for bin construction and sequential sampling, experimental constraints, and the stopping rule used to assess equivalency between a candidate and reference process.

Variables:

Variable	Definition	Suggested Value
m	Number of Subgroups	20
n	Number of Samples per Subgroup	2
b	Number of Bins	8
$N_{\max}$	Maximal Sample Size	299
$N_{1}$	Initial Sample Size	5
T	Maximum Number of Samples	14
$α$	Sig. Level for Non-equivalent Stopping Rule	0.01

Equations:

P_{i} = \frac{i}{b} \times 100 for i = 0, 1, \dots, b

(1)

N_{t + 1} = \frac{2 t T N_{1} + t^{2} - t^{2} N_{1} + t - t N_{1} - 2 + 2 N_{1}}{2 T} for t = 1, 2, \dots, T - 1

(2)

χ_{t}^{2} = \sum_{i = 1}^{b} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} where E_{i} = N_{t} / b

(3)

{\tilde{V}}_{t} = \frac{\sqrt{χ_{t}^{2} / N_{t}}}{\sqrt{b - 1}}

(4)

C I_{t}^{+} = \frac{1}{\sqrt{N_{t} (b - 1)}} \sqrt{χ_{α, b - 1}^{2}}

(5)

Experimental Constraints:

\begin{matrix} b & \geq 2 & N_{1} & < N_{\max} & N_{1} & < \frac{2 T N_{\max} - T^{2} - T + 2}{T^{2} - T + 2} \end{matrix}

(6)

Characterizing the reference distribution requires sampling m subgroups of size n and using an x-bar chart to determine whether it is stable. If it is, then use percentiles (Equation (1)) to divide the distribution into b equally-likely bins. If it is not, then perform root-cause analysis to determine the source of instability and then redraw samples. To determine whether the candidate distribution is equivalent to the reference requires a protocol for selecting the sample sizes of sequential samples (Equation (2)) and measuring how well evenly distributed the observed counts are across the bins (Equation (3)), and converting this value to a standardized goodness of fit metric (Equation (4)) and comparing it to the upper bound of its expected value (Equation (5)). For each sample, see if

{\tilde{V}}_{t} < C I_{t}^{+}

. If it is, keep sampling until

t = T

. If all samples fall within this bound, the processes are significant. If any one of these samples has

{\tilde{V}}_{t} \geq C I_{t}^{+}

, then the processes are not equivalent and sampling can stop here.

2. Characterizing the Reference Distribution

Qualification and certification are critical steps for developing and manufacturing products [45]. Qualification involves examining a prototype, a material, or a product during its production [28]. Many different aspects of AM production can be qualified, including techniques, machines, materials, parts, and suppliers [28]. These same processes can also be certified, which is primarily established by satisfying a centralized authority or organization [28]. In both cases, properties of the process to be qualified or certified must be assessed and compared to some industry standard [27]. Here, we describe how a stable reference distribution can be established for a single response variable to enable principled comparisons across builds, machines, parameter sets, or suppliers (Figure 1A).

First, a reference process should be selected. A reference process should ideally already be certified. That is, it should be able to produce products that meet institutional requirements with respect to a performance metric, such as tensile strength [12], although some care should be taken to ensure that such metrics are sensitive to underlying defects or other undesirable characteristics [46]. If such a process is not available, then there should be a good reason to believe that the reference process is reliable or otherwise represents a gold standard of production. These criteria are considered representative of an acceptable reference process since they indicate a degree of repeatability (the parts produced by the machine have similar mechanical properties per material) and provide a higher level of user control (most process parameters are available to the user).

Second, a sample of the reference process should be taken. This sample should serve a dual purpose: it can be used to assess process stability, and it can be used to characterize the reference distribution. In the field of statistical quality control, a process is stable if sequential measurements show no persistent pattern and thus can be modeled as Gaussian noise [47]. Additionally, if there are no outliers that draw samples outside of process control limits, then the process is said to be ‘in control’ [48]. Stability is a necessary condition for the reference process, as otherwise it is untrustworthy.

In Section 2.1, we show how control charts can be used to assess stability, and in Section 2.2, we discuss how percentiles can be used to characterize the reference distribution. In Section 2.3, we will give recommendations for sample sizes.

2.1. Establishing Process Stability with Control Charts

To determine whether or not a process is stable, researchers in additive manufacturing can utilize control charts [49]. Control charts, also known as Shewhart charts or process-behavior charts, are a key tool in statistical process control (SPC). They help monitor how a process changes over time by plotting data points in time order [50]. While there are multiple types of control charts, including CUSUM and EWMA charts [51], we will focus on the Shewhart control chart as this is the single type of chart that has been the focus of the highest number of optimization studies [52].

A Shewhart control chart consists of 3 lines and points on a graph (Figure 2; [48]). The chart is built by taking measurements of the focal metric over time. The data are grouped by time period, location, or another blocking factor of the process. In an AM context, this factor could capture the number of products on a single build plate. These sets of data are called subgroups, and each subgroup will have more than one observation. One then takes the mean of each subgroup and plots these means chronologically on the control chart. The global average of these subgroups gives the center line. The lines above and below the center line (the upper and lower control limits, respectively) are derived from the center line, the range of the metric within each subgroup, and n. One then uses various rules to determine whether or not the process is in control or not (Table 1; [53]), although different authors will use different variations of these rules (i.e., [50]). If a sample passes each of these tests, then the reference process is considered stable and can be characterized. If it does not pass, the source of special-cause variation should be identified and corrected, and samples should be retaken, although this is beyond the scope of this publication. For candidate processes, control-chart assessment is best performed after all planned samples are collected (or after enough subgroups exist to estimate limits reliably), because early control limits can be noisy and may yield false alarms (a stable process appears unstable).

2.2. Dividing Reference Distribution into Discrete Bins

In many situations, characterizing a distribution requires fitting various families of distributions to data and then determining which of these fits the best [54]. However, such procedures can be sensitive to outliers [55] and since various goodness-of-fit tests can be insignificant for multiple types of distributions, there can be some ambiguity when two distribution types seem to describe the same empirical distribution well. Additionally, when determining whether or not two distributions are equivalent, it is not necessary to know the exact distribution type (if one even exists for a particular dataset). It is sufficient to know where future datapoints are likely to occur. We can therefore sidestep the issue of an arbitrary selection by characterizing the reference distribution nonparametrically. That is, we can describe the distribution with empirical percentiles rather than with point estimates of distribution parameters. These percentiles divide the distribution into equal-probability bins, and we can use these bin limits to compare the reference to the candidate process via

χ^{2}

-like goodness-of-fit tests.

Percentile binning introduces an intentional loss of resolution, and this is both a limitation and a design feature of the method. By converting a continuous distribution into b empirical bins, the procedure creates an explicit trade-off between distributional resolution and feasibility. Increasing the number of bins improves resolution and allows finer distinctions between candidate and reference distributions, but it also increases sample-size requirements and can reduce statistical power [56]. Conversely, using fewer bins improves feasibility and may increase power, but at the cost of reduced information content, since small shifts that occur largely within bins may be blurred and therefore remain undetected. The choice of b should therefore be treated as a user-selected design parameter rather than a universal constant, allowing investigators to determine what level of resolution is worth the associated sampling cost. This binning strategy also provides some robustness to extreme observations, since outliers are absorbed into the first or last bins rather than exerting disproportionate influence on the full characterization procedure. More generally, the use of a

χ^{2}

-based goodness-of-fit framework can provide favorable power relative to some alternatives under certain conditions [56], further supporting its use in small-sample settings. For these reasons, we view the proposed method as especially appropriate when investigators prefer a coarse but robust nonparametric comparison that can be supported by limited sample sizes. When the primary goal is to detect very subtle distributional shifts and larger samples are available, methods that operate directly on continuous distributions may be preferable.

To set percentile P at index i for a given reference distribution,

P_{i} = \frac{i}{b} \times 100 for i = 0, 1, \dots, b

(7)

For example, if

b = 4

,

P = [0, 25, 50, 75, 100]

where P is the vector which contains each percentile. These percentiles define the ranges of each bin. For instance, if an observation of the metric of interest of the candidate process appears around the 10th percentile of that same metric for reference process, then it would appear in the first bin, as it is bounded by the 0th and 25th percentiles.

2.3. Setting the Sample Size for the Reference Distribution

The production of parts with PBF-LB/M as well as other AM methods is expensive, so therefore sample sizes should be kept as small as possible. Still, characterizing the reference distribution requires more than demonstrating stability alone. Although statistical control is a necessary prerequisite, the reference distribution must also be based on measurements obtained under an adequate and consistent measurement system, from sampling conditions that are representative of the machine–material–parameter configuration of interest, and with appropriate attention to spatial and temporal sources of variation that could affect the intended equivalency claim. We therefore suggest a fixed sample size method for assessing the reference distribution derived from constraints set by both control charts and

χ^{2}

tests.

Control charts require n samples from m chronologically ordered groups, so the total sample size is

N = n m

. Generally,

2 \leq n \leq 10

and

20 \leq m \leq 30

([48]; although it is also possible to build control charts with single measurements where

n = 1

; [57]), so the minimal sample size needed to assess stability is

N = 2 \times 20 = 40

. The

χ^{2}

test requires that the expected number of observations for each bin is 5, so when there are b bins,

N = 5 b

. Setting these requirements equal to one another means that for a sample size of 40, we can assess

b = 8

bins. Therefore, characterizing the reference distribution requires at least 40 samples at a resolution of 8 bins. This represents the lowest sample size limit for characterizing the reference process, although sample sizes can go up to

N = 10 \times 30 = 300

if the budget permits it. Here, the upper limit for b is 60. It should be noted, however, that there is some evidence that smaller within-subgroup sample sizes (

n = 2

) can outperform larger subgroup sample sizes (n = 4–6) for

m = 20

[58] in terms of determining stability, so there may be diminishing returns from increasing the sample size. As control limits constrict with large sample sizes, many samples may appear to be out of control when there is no common cause of variation, so the false positive rate can be artificially high with large sample sizes [59]. Additionally, we show that the bin limits stabilize around their true values at

N = 40

using simulations (Appendix A), so this small sample size should be feasible for many applications. That said, after qualification, the process under question may continue printing the same parts, so control charts should be used to continue monitoring stability. These additional sample points can then be pooled into the existing dataset, and bin limits can be updated accordingly (Appendix B). This recommendation for

N = 40

, then, only represents the starting point of characterizing the reference distribution.

3. Optimal Sequential Sampling for Equivalency

After the reference distribution has been characterized, it can be compared to multiple candidate processes. In this situation, there is a clear null and alternative hypothesis. The null is that the candidate distribution is equivalent to that of the reference distribution. Numerically, this means that observations from the candidate process are uniformly distributed across the bins defined by the reference process. If this distribution is anything other than uniform, then the two processes are not equivalent. As there is a clear alternative hypothesis, we can save on production costs of parts by using an iterative sampling method. Here, we do not sample all at once, instead dividing the sample into smaller sub-samples and assessing equivalency on the way. In practice, equivalency should only be evaluated for processes that are demonstrably stable with respect to the metric of interest. In this work, candidate-process stability is treated as a prerequisite to equivalency testing rather than as a condition that can be assumed without verification. This distinction is especially important in additive manufacturing, where both temporal variability (e.g., drift across builds, recoater wear, optics degradation, powder aging, or environmental changes) and spatial variability (e.g., location-dependent effects within a build plate) may introduce a systematic structure into the data. Accordingly, we assume that the submitting entity has established process stability using time-ordered statistical process control (SPC) evidence prior to equivalency evaluation, for example through control charts or an equivalent monitoring framework applied to a stable machine–material–parameter configuration [27]. In practice, this means that the measurements used to define the reference distribution, as well as those collected from the candidate process, should be drawn from production conditions for which no unresolved out-of-control signals, sustained trends, or nonrandom patterns are present. When relevant, stability assessment should also account for within-build spatial heterogeneity by either controlling artifact location, randomizing sampling locations, or explicitly demonstrating that positional effects are acceptably small relative to the intended equivalency claim. In addition, because the proposed framework operates on the observed measurements, equivalency decisions reflect the combined effects of intrinsic process variability and measurement-system uncertainty. For this reason, the method should be applied only when the measurement system has also been shown to be adequately controlled, for example through calibration, repeatability/reproducibility studies, or equivalent metrological evidence appropriate to the application. If measurement error is large relative to the process differences of interest, or if measurement conditions differ systematically between the reference and candidate datasets, equivalency decisions may become difficult to interpret because measurement noise can either obscure real process differences or induce apparent non-equivalence. If instability is detected, whether in the process itself or in the measurement system, the appropriate response is not to proceed directly to equivalency testing, but to first identify and address the source of variation and then re-establish control. The proposed framework should therefore be interpreted as operating after both the process and the measurement system have been shown to be adequately controlled, not as a substitute for the broader process-validation, metrology, and SPC activities required to demonstrate that control. This is consistent with the broader qualification ethos that process capability and control are prerequisites to statistical accept/reject decisions [27].

Here, we specify a sequential sampling method specifically geared to test equivalency in a way that optimally trades off the information gain from sequential sampling with the marginal costs of those samples. This optimization protocol results in a set of closed-form equations where users set the parameters so that the sampling strategy is tailored to their needs. We validate these results with simulations and give general guidelines on how to set the experimental parameters. To derive these equations, we follow these steps:

1.: We show the $χ^{2}$ statistic is a poor statistic to assess equivalency, as it is independent of sample size (N).
2.: We show that Cramér’s V has a predictable relationship with N, so it makes for a better measure of sample quality.
3.: We define functions for the setting the sample sizes of sequential samples, the information gain of sequential samples (derived from Cramér’s V), and their marginal costs.
4.: We combine these functions into a single objective function to be minimized by the sequential sampling strategy.
5.: We constrain the objective function by limiting the space of possible solutions.
6.: We find the sampling strategy that minimizes the objective function in the acceptable domain.
7.: We define a stopping rule for sequential sampling when the candidate distribution is sufficiently different from the reference.
8.: We discuss the potential utility of outlier detection as a complementary approach to equivalency.
9.: We optimize experimental parameters for various non-equivalent scenarios with power analyses via simulations and compare final sample sizes to those required by the fixed sample size method.
10.: We summarize the process so that readers have the minimal information needed to perform sequential sampling.
11.: We discuss a different formulation of equivalency based on current standards, and discuss how the two methods align.

To begin, we assume that percentiles have already been used to characterize the reference distribution and are available for comparison with the candidate distribution.

3.1. $χ^{2}$ Statistic Is Invariant to Sample Size

Once the continuous reference distribution has been discretized into b empirical percentile bins, the comparison problem becomes one of assessing whether the candidate observations are allocated across those bins as expected under equivalency. Because bins are defined by reference percentiles, each bin has the same expected probability under the reference distribution, and therefore a candidate sample of size N should yield expected bin counts of approximately

N / b

under equivalency. This converts the problem into a multinomial goodness-of-fit setting, for which a

χ^{2}

-type discrepancy statistic provides a natural and interpretable measure of departure between the observed and expected counts. An additional advantage of this choice is that the contribution of each bin to the overall discrepancy can be examined directly, and the resulting statistic can be normalized through Cramér’s V to support comparisons across sample sizes within the sequential sampling framework. In this context, the

χ^{2}

test statistic captures the difference between the expected counts from a fitted distribution (

E_{i}

) and the observed counts (

O_{i}

) within each bin:

χ^{2} = \sum_{i = 1}^{b} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}}

(8)

where b is the number of bins. One may expect that as the sample size (N) increases,

χ^{2}

should decrease, perhaps asymptotically approaching some constant value if the null hypothesis is true; otherwise it may diverge. To calculate the expected value of

χ^{2}

across N given the null hypothesis, we can simplify Equation (8) as

E_{1} = E_{2} = \dots = E_{b} = E [O_{i}] = N / b

. This, coupled with the fact that the number of observations in each bin should be equivalent, means that the summation is unnecessary and we can simply multiply by the number of summation steps, so

E [χ^{2}] = b E [{(O_{i} - E [O_{i}])}^{2} / E [O_{i}]] = \frac{b}{E [O_{i}]} E [{(O_{i} - E [O_{i}])}^{2}] = \frac{b}{E [O_{i}]} Var (O_{i})

(9)

As

O_{i}

is a binomial random variable (an observation may or may not occur in bin i),

Var (O_{i}) = N p (1 - p)

where

p = 1 / b

and

E [O_{i}] = N / b

, so:

E [χ^{2}] = \frac{b}{N / b} \frac{N}{b} (1 - \frac{1}{b}) = b (1 - \frac{1}{b})

(10)

Thus

E [χ^{2}] = b - 1

(11)

Notice that the Ns cancel out in this expression, so therefore

E [χ^{2}]

will not change with the sample size, so it cannot be used to compare the quality of two samples with different sample sizes. The larger sample may have the higher

χ^{2}

value not due to any features or bugs of the sampling strategy, but rather due to a random fluctuation.

3.2. Cramér’s V Varies Predictably with Sample Size

Cramér’s V is the measure of the effect size of a

χ^{2}

test, measuring the association between nominal variables [60]. If

V = 0

, then there is no association between the variables; therefore, the expected counts of each cell in each row are equivalent. Conversely, if it is high, then there is a strong association; the counts are different from one another. Cramér’s V is given by the expression

V = \sqrt{\frac{χ^{2} / N}{\min (b - 1, r - 1)}}

(12)

where r is the number of rows in a contingency table. For our equivalency tests,

r = 2

, and the lowest possible value of

b = 2

, so the denominator of this expression is always 1. Therefore, this expression simplifies to

V = \sqrt{χ^{2} / N}

(13)

To make this value more interpretable, we can normalize it so that it varies between 0 and 1, where 0 indicates that the observed and expected counts are identical and 1 represents maximal difference between these counts. We can normalize this statistic with the equation

\tilde{V} = \frac{V - V_{\min}}{V_{\max} - V_{\min}}

(14)

V_{\min}

occurs when

χ^{2}

is at its smallest possible value where

O_{i} = E_{i}

, so

χ_{\min}^{2} = 0 + 0 + \dots + 0 = 0

, thus

V_{\min} = \sqrt{0 / N} = 0

.

V_{\max}

occurs when

χ^{2}

is at its largest possible value (

χ_{\max}^{2}

), which occurs when all observations occur in a single bin (

O_{1} = N, O_{i > 1} = 0)

, so

χ_{\max}^{2} = \frac{{(N - N / b)}^{2}}{N / b} + \frac{{(0 - N / b)}^{2}}{N / b} + \dots + \frac{{(0 - N / b)}^{2}}{N / b} = N b - N

(15)

and therefore

V_{\max} = \sqrt{\frac{N b - N}{N}} = \sqrt{b - 1}

(16)

Plugging this into Equation (14) along with

V_{\min}

and V and simplifying yields

\tilde{V} = \frac{\sqrt{χ^{2} / N}}{\sqrt{b - 1}}

(17)

To get the expected value of

\tilde{V}

, we first separate the random variable

χ^{2}

from the rest of the expression:

\tilde{V} = \frac{1}{\sqrt{N (b - 1)}} χ

(18)

which means that

\tilde{V}

is a

χ

-distributed random variable. To get the expected value of

\tilde{V}

, we multiply the expected value of a

χ

distribution against the scaling factor on the left:

E [\tilde{V}] = \frac{1}{\sqrt{N (b - 1)}} E [χ] = \frac{1}{\sqrt{N (b - 1)}} (\frac{\sqrt{2} Γ (\frac{b}{2})}{Γ (\frac{b - 1}{2})})

(19)

We further simplify by moving b to the right-hand side of this expression:

E [\tilde{V}] = \frac{1}{\sqrt{N}} (\frac{\sqrt{2} Γ (\frac{b}{2})}{\sqrt{b - 1} Γ (\frac{b - 1}{2})})

(20)

The right side of this expression acts as a correction factor, adjusting the expected value of

\tilde{V}

given different values of b. Its minimum is approximately 0.798 when

b = 2

and approaches 1 when

b \to \infty

. To simplify further derivations, we define the function

A (b)

so that it contains all of the expressions related to b:

A (b) = \frac{\sqrt{2} Γ (\frac{b}{2})}{\sqrt{b - 1} Γ (\frac{b - 1}{2})}

(21)

which gives us our final equation for

E [\tilde{V}]

(Figure 3), where

A (b)

can be understood as the degree to which

\tilde{V}

needs to be corrected to account for the fact that it derives from a

χ

distribution rather than a

χ^{2}

distribution:

E [\tilde{V}] = \frac{1}{\sqrt{N}} A (b)

(22)

3.3. Defining the Sampling Function, Information Gain, and Marginal Costs of Sequential Samples

Let t be the sampling step and

t \in {1, 2, \dots, T}

, where T is an experimental parameter set by the user which determines the maximal number of sequential samples they are willing to take. The sample size at each step is

N_{t}

, which cumulatively adds the number of samples from each step. For instance, if

N_{1} = 10

and the user wants to add 5 samples, then

N_{2} = 15

. This iterative process can be expressed as

N_{t + 1} = N_{t} + f (N_{t + 1})

(23)

where

N_{1}

is an initialization parameter set by the user and

f (N_{t + 1})

determines how many samples will be added to

N_{t}

to produce

N_{t + 1}

. We assume that

f (N_{t + 1})

is linear where sequential values of

N_{t}

increase or decrease at a constant rate s and is scaled by

N_{1}

,

N_{t + 1} = N_{t} + N_{1} + s t

(24)

which has the closed form:

N_{t + 1} = t N_{1} + s \frac{(t - 1) (t + 2)}{2}

(25)

Our optimization goal is to find the value of s which simultaneously maximizes the information gain of the next sample and minimizes its cost. We hypothesize that the optimal value of s depends in part on the underlying shape of the cost function. Generally, the cost of producing

N_{t}

parts can be calculated as

C_{t} = F + d N_{t} + q N_{t}^{λ}

(26)

where F is the fixed cost of producing parts, d is the per-unit cost of testing the quality of

N_{t}

parts, q is the per-unit cost of producing

N_{t}

parts, and

λ

determines the scaling relationship between the sample size and the cost of each part ([61]; Figure 4). To define the optimization problem, we focus on the costs that vary with the sequential sampling decision itself rather than attempting to model the full cost of qualification. Accordingly, fixed costs are omitted because they remain constant across the candidate sequential strategies considered here and therefore do not affect which strategy is preferred. We also omit testing cost in the present formulation because every produced part is assumed to be measured for equivalency evaluation, so measurement cost scales directly with the number of produced parts and does not introduce an additional trade-off separate from sample production. Under these assumptions, the relevant cost quantity is the incremental cost associated with continuing to produce and evaluate additional samples before a stopping decision is reached. Therefore the cost equation simplifies to

C_{t} = q N_{t}^{λ}

(27)

When

0 < λ < 1

, the cost of production diminishes with sample size, perhaps as a result of some efficiency that can be leveraged to produce parts at scale. Here, s can be greater than 0, as later samples cost the same amount as earlier samples. When

λ = 1

, then the scaling relationship is linear; the cost per part does not change with the total number of parts produced. Spacing between sequential samples can therefore also be linear; there will be no difference in costs of later samples versus earlier ones. Finally, if

λ > 1

, then later samples will be more expensive than earlier samples, so we can set

s < 0

to avoid the region of high costs.

The marginal cost of the next sample is given as

C_{t + 1}^{gain} = C_{t + 1} - C_{t} = q N_{t + 1}^{λ} - q N_{t}^{λ}

(28)

whereas the expected information gained from the next sample will be measured as the expected reduction in

\tilde{V}

across samples

{\tilde{V}}_{t + 1}^{loss} = E [{\tilde{V}}_{t + 1}] - E [{\tilde{V}}_{t}] = A (b) \frac{1}{\sqrt{N_{t + 1}}} - A (b) \frac{1}{\sqrt{N_{t}}}

(29)

which simplifies to

{\tilde{V}}_{t + 1}^{loss} = A (b) (\frac{1}{\sqrt{N_{t + 1}}} - \frac{1}{\sqrt{N_{t}}})

(30)

Note that this cost model is intentionally simplified. It is not intended to capture the full economics of any single AM process and does not explicitly represent build-level batching, setup, fixturing, post-processing, queue time, or parallel specimen production. In LPBF especially, multiple artifacts may be produced in a single build, which can flatten or otherwise alter the marginal-cost structure assumed by the model [43]. Accordingly, the role of the current cost function is heuristic: it provides a transparent way to encode whether marginal sampling cost is expected to increase, decrease, or remain approximately linear across sampling steps. Future work should replace this abstraction with process-specific cost models that explicitly incorporate batching, measurement constraints, and build-level economics.

3.4. Defining the Objective Function to Trade off Information Gain with Marginal Cost

Our optimization goal is to find the value of s which maximizes information gain and minimizes marginal cost. To do this, we perform linear scalarization: we transform both objectives so that they are on the same scale, and then we add the two to generate the objective function

L

. We then find the value of s which minimizes

L

.

First, as our goal is to minimize the objective function, we take the negative of the information gain so that

{\tilde{V}}_{t + 1}^{gain} = - A (b) (\frac{1}{\sqrt{N_{t + 1}}} - \frac{1}{\sqrt{N_{t}}}) = A (b) (\frac{1}{\sqrt{N_{t}}} - \frac{1}{\sqrt{N_{t + 1}}})

(31)

To scale

{\tilde{V}}^{gain}

, we divide this expression by the maximal possible difference. As

\tilde{V}

is already scaled to take on values between 0 and 1, the maximal difference is 1. To scale

C^{gain}

, we also divide by the maximal possible difference. This difference occurs when the sample size is 0 and when the sample size is at its maximal allowable value,

N_{\max}

, another parameter set by the user. Therefore, the maximal possible difference is

q n_{\max}^{λ}

and

L

is given by

L = w_{1} (\frac{A (b) (\frac{1}{\sqrt{N_{t}}} - \frac{1}{\sqrt{N_{t + 1}}})}{1}) + w_{2} (\frac{q N_{t + 1}^{λ} - q N_{t}^{λ}}{q N_{\max}^{λ}})

(32)

where the w’s are the weights attributed to each minimization goal. Here, we assume that the weights sum to unity, so

w_{2} = 1 - w_{1}

.

L

simplifies to

L = w_{1} A (b) (\frac{1}{\sqrt{N_{t}}} - \frac{1}{\sqrt{N_{t + 1}}}) + (1 - w_{1}) (\frac{N_{t + 1}^{λ} - N_{t}^{λ}}{N_{\max}^{λ}})

(33)

As we set the differences between sequential sample sizes to be linear (Equation (24)), we can fully determine s by only evaluating the first two samples,

N_{1}

and

N_{2}

.

N_{1}

is a constant set by the user, and

N_{2} = 2 N_{1} + 2 s

(setting

t = 2

in Equation (25)), and we can substitute these values into

N_{t}

and

N_{t + 1}

, respectively:

L = w_{1} A (b) (\frac{1}{\sqrt{N_{1}}} - \frac{1}{\sqrt{2 N_{1} + 2 s}}) + (1 - w_{1}) (\frac{{(2 N_{1} + 2 s)}^{λ} - N_{1}^{λ}}{N_{\max}^{λ}})

(34)

3.5. Setting Constraints on s

Not all real values of s are allowed. In sequential sampling, later samples are added to prior samples, so

N_{t + 1} > N_{t}

. Additionally, we cannot allow any strategies that generate larger sample sizes than the limit set by the user, so

N_{T} \leq N_{\max}

. These set lower and upper bounds on s, respectively.

To find

s_{\min}

, we set

N_{T} = N_{T - 1} + 1

, which is

T N_{1} + s_{\min} \frac{(T - 1) (T + 2)}{2} = (T - 1) N_{1} + s_{\min} \frac{(T - 2) (T + 1)}{2}

(35)

and solve for

s_{\min}

:

s_{\min} = \frac{1 - N_{1}}{T}

(36)

To find

s_{\max}

, we set

N_{T} = N_{\max}

, which is

N_{\max} = T N_{1} + s_{\max} \frac{(T - 1) (T + 2)}{2}

(37)

and solve for

s_{\max}

s_{\max} = \frac{2 N_{m a x}}{T N_{1} (T - 1) (T + 2)}

(38)

Therefore,

\frac{1 - N_{1}}{T} \leq s \leq \frac{2 N_{m a x}}{T N_{1} (T - 1) (T + 2)}

(39)

3.6. Minimizing $L$ in the Acceptable Domain of s

To optimize

L

, we first find the derivative of Equation (34):

\frac{d L}{d s} = \frac{A (b) N_{\max}^{λ} w_{1} + 2 λ (1 - w_{1}) {(2 N_{1} + 2 s)}^{λ + 1 / 2}}{N_{\max}^{λ} {(2 N_{1} + 2 s)}^{3 / 2}}

(40)

By definition,

λ > 0, N_{1} > 1

,

N_{\max} > 1

, and

A (b) > 0

, meaning this derivative is always positive when

2 N_{1} + 2 s > 0

. This is least likely to be true when

s = s_{\min}

. Given Equation (36),

s_{\min}

will be at its minimum when

N_{1} = N_{\max}

, so

N_{\max} + \frac{1 - N_{\max}}{T} > 0

(41)

which is true so long as

T \geq 1

, which is another necessary constraint of sequential sampling. This means that

L

is always positive in the acceptable domain, so its minimum will be at the lower boundary

s_{\min}

. This means that regardless of the value of

λ

and

w_{1}

, the optimal value of s will always be negative.

With this optimal value of s, we can finally define a function for selecting sequential samples by substituting

s_{\min}

in for s in Equation (25):

N_{t + 1} = t N_{1} + (\frac{1 - N_{1}}{T}) \frac{(t - 1) (t + 2)}{2}

(42)

which simplifies to:

N_{t + 1} = \frac{2 t T N_{1} + t^{2} - t^{2} N_{1} + t - t N_{1} - 2 + 2 N_{1}}{2 T}

(43)

3.7. Stopping Rule for Determining Equivalency

The main benefit of sequential sampling is that if two processes are not equivalent then early samples will show this. Larger sample sizes would incur an unnecessary cost. A mismatch between the reference and candidate processes would manifest as a larger than expected

\tilde{V}

. However, under the traditional hypothesis testing framework, an early sample with

\tilde{V}

within its expected bounds is not necessarily proof that the null hypothesis (the two processes are equivalent) is true. Only repeated failures to reject the null hypothesis would provide evidence that it is true, or at least approximately true. Therefore, one would need to sample T times if the two processes are equivalent, but if the two processes are not equivalent then one would need to sample fewer than T times.

We would stop sampling when the measured value of

{\tilde{V}}_{t}

for sample t is greater than the upper confidence for this random variable for its given

N_{t}

for that sample. We find this limit by substituting the square root of the critical value of the

χ^{2}

distribution into the expected value of

{\tilde{V}}_{t}

for a given significance level

α

:

C I_{t}^{+} = \frac{1}{\sqrt{N_{t} (b - 1)}} \sqrt{χ_{α, b - 1}^{2}}

(44)

It should also be noted that a root cause analysis should be performed when a sample is declared non-equivalent. A few bad parts can throw off the

{\tilde{V}}_{t}

statistic, so if it can be determined that the part was malformed due to a one-time event (i.e., an operator mistake), then it can be thrown out and built again, althe sample is representative of the entire process, it cannot be discarded. If it is determined that the sample is representative of the whole process, then it cannot be tossed out.

3.8. Outliers as a Complementary Diagnostic, a Recommended Secondary Analysis

The proposed equivalency workflow is intentionally nonparametric and percentile-based, which reduces sensitivity to single extreme observations by confining their influence to the outer bins. This robustness is desirable for accept/reject decisions when sample sizes are limited. Nevertheless, outliers can be operationally important in AM because they may indicate transient special-cause events (process instabilities, hardware faults, powder anomalies) or unusually favorable behavior worth understanding. We therefore recommend that outlier screening be treated as a complementary diagnostic rather than part of the equivalency decision rule; numerous robust and classical procedures for univariate and multivariate outlier identification are available in the literature (e.g., robust distance and minimum covariance determinant methods, influence-function-based diagnostics, and forward search techniques), and the reader is referred to established treatments for practical implementation and guidance [62,63,64].

In practice, if equivalency is rejected, investigators should examine which bins (often tails) contribute most to

χ^{2}

and whether a small number of observations dominate those bins, then conduct a root-cause analysis to distinguish metrology artifacts from genuine process excursions. Even when equivalency is accepted, reviewing tail observations can help identify rare but high-consequence failure modes (or opportunities for improvement). Outlier-handling rules should be specified a priori to prevent post hoc bias.

3.9. Finding Optimal Values at T and $N_{1}$ with Power Analyses

The parameters T and

N_{1}

can be set to the discretion of the user based on cost and time constraints of the experiment. However, different combinations of these values will yield different discriminatory capacities of the experiment, so these capacities should also be included in the experimental design process. Here, we perform power analyses via simulation to determine the values of T and

N_{1}

which minimize N while achieving a power level of 0.8 or higher while also minimizing the false positive rate.

3.9.1. Simulation Description

In this context, the null hypothesis is that the candidate distribution is equivalent to the reference. The alternative hypothesis is that they differ in some way. Power is the probability that a statistical test will determine that two processes are not equivalent when this is in fact the ground truth. Under the null hypothesis, there is an equal probability that any observation will occur in any bin. The alternative hypothesis is that the probability distribution is not uniform. As the order in which the probabilities differ does not matter for the calculation of the

χ^{2}

statistic (Equation (8)), we choose to vary the probabilities with a truncated geometric distribution. When the singular parameter of this distribution (

ρ)

is high, then the probability clusters to early bins (Figure 5; [65]). When

ρ

approaches 0, then it produces a near uniform distribution; thus, by altering a single parameter we can produce a wide array of potential experimental outcomes:

p_{i} = \frac{{(1 - ρ)}^{i - 1} ρ}{1 - {(1 - ρ)}^{b}} for i = 1, 2, \dots, b

(45)

We set

b = 8

to account for the possibility that users will set the sample size for the reference distribution to be as small as possible. While we show what the optimal sampling strategy is given this constraint, we also give a blueprint for how such simulation studies can be done for future studies which may allow for higher sample sizes and bin counts. This optimization framework is also implemented in the AM equivalency package [44].

To run these simulations, we first find the values of

N_{1}

and T which give a final sample size of 40. Each sampling strategy can be summarized as

N_{1} : T

, and there are 8 strategies where

N_{T} = 40

. These are

1 : 40

,

2 : 27

,

5 : 14

,

6 : 12

,

7 : 5

,

29 : 3

,

39 : 2

,

40 : 1

. Note that s is either 0 or negative for each of these experiments, so the change in sample sizes either stays the same (as is the case for

1 : 40

) or decreases until the difference between the last and second-to-last samples is 1. Once these experimental parameters are set, we use Equation (43) to determine the sample size sequence. Next, for half of our simulations, we randomly draw a value for

ρ

between 0 and 1 from a uniform distribution to set the probability that an observation will occur in each bin. For the other half of our simulations, we set

ρ = 0

. We then sequentially sample from this truncated geometric distribution. For each sample t, we calculate

{\tilde{V}}_{t}

(Equation (4)). We then compare the sample’s

{\tilde{V}}_{t}

to

C I_{t}^{+}

(Equation (5)) for a given value of

α

. If

{\tilde{V}}_{t} < C I_{t}^{+}

, then we continue sampling. If

{\tilde{V}}_{t} \geq C I_{t}^{+}

, then the processes are not equivalent and sampling ceases. If this condition is never met, then sampling continues until

t = T

. In cases where

ρ > 0

, we calculate power (the probability that the sampling strategy correctly rejects the null hypothesis) as well as the final sample size where this determination was made. In cases where

ρ = 0

, we calculate the false positive rate: the probability that the null hypothesis was rejected even though it is true. For every sampling strategy, we run 100 simulations where the null hypothesis is true and another 100 where the null hypothesis is false. There is a set value of

ρ

for each simulation. We repeat each simulation 100 times and calculate power and the average sample size at the conclusion of sampling when

ρ > 0

. When

ρ = 0

, we calculate the false positive rate. In total, there are

8 \times 2 \times 100 \times 100

= 160,000 simulations. We also set

α = 0.01

.

3.9.2. Finding Optimal Sampling Strategy

As this is a multi-objective optimization problem where we want to maximize power (discrimination level) while minimizing final sample size and the false positive rate, we use desirability functions to combine these 3 variables into a single metric and find the strategy that maximizes this composite metric [66]. Each of these performance measures indexed as

j \in {1, 2, 3}

can be recast as a desirability

d_{j}

. When the goal is to maximize

d_{j}

as is the case with power (

x_{1}

), we use the following function to maximize it:

d_{1} = \{\begin{matrix} 0, & if x_{1} \leq L_{1} \\ {(\frac{x_{1} - L_{1}}{Q_{1} - L_{1}})}^{k}, & if L_{1} < x_{1} < Q_{1} \\ 1, & if x_{1} \geq Q_{1} \end{matrix}

(46)

where

L_{1}

is the lowest acceptable power value,

Q_{1}

is the target value of the power, and the exponent k determines how important it is to hit the target.

0 < k < 1

produces a convex function,

k = 1

produces a linear function, and

k > 1

produces a convex function. Here we set

L_{1} = 0

and

Q_{1} = 1

.

Next, we set the desirabilities for sample size (

x_{2}

) and the false positive rate (

x_{3}

). In both of these cases, we want to minimize the performance metric, so we use the following functions to do it:

d_{2} = \{\begin{matrix} 1, & if x_{2} \leq Q_{2} \\ {(\frac{x_{2} - U_{2}}{Q_{2} - U_{2}})}^{k}, & if Q_{2} < x_{2} < U_{2} \\ 0, & if x_{2} \geq U_{2} \end{matrix}

(47)

d_{3} = \{\begin{matrix} 1, & if x_{3} \leq Q_{3} \\ {(\frac{x_{3} - U_{2}}{Q_{3} - U_{3}})}^{k}, & if Q_{3} < x_{3} < U_{3} \\ 0, & if x_{3} \geq U_{3} \end{matrix}

(48)

where T is still the target but U is the upper limit. Here we set

Q_{2} = 8

,

U_{2} = 41

,

Q_{3} = 0

and

U_{3} = 1

. We also set

k = 2

for each desirability function.

Next, we introduce the weight

ω_{j}

to set the relative importance of each variable, where the weights sum to unity,

\sum_{j} ω_{j} = 1

. If

ω_{j} = 0

, then the j’th variable is unimportant and will not influence the final optimization result. In contrast, if

ω_{j} = 1

, then j’th variable is maximally important and will be the only variable that influences the final result.

With these weights, we can introduce the overall desirability D, which is the product of the desirabilities with their associated weight set as the exponent:

D = \prod_{j} d_{j}^{ω_{j}}

(49)

As two of these performance metrics are involved in evaluating situations where the null hypothesis is false (the reference and candidate distributions are not equivalent) and only one is calculated when the null hypothesis is true, we set the weights so that the null is not true case does not overwhelm the null is true case:

ω = [1 / 4, 1 / 4, 1 / 2]

.

We find that D peaks at

N_{1} : T = 5 : 14

(Figure 6). Here, the expected sample size (given the null is not true) is minimal while the power level is above

0.8

and the false positive rate is around

0.05

. It therefore has some desirable properties which make it a good candidate as a sampling strategy. This sampling strategy corresponds to the sequence

N_{t} = {5, 9, 14, 17, 21, 24, 27, 30, 32, 35, 36, 38, 39, 40}

.

3.10. Summarized Steps of Equivalency

Box 1 gives the minimal amount of information needed to perform experiments using the equivalency process. It gives quick descriptions of variable names, their recommended values, and the equations needed to test for equivalency. All analyses in this paper are reproducible using the accompanying R package AMEquivalency (which is publicly available at the repository [44]), which implements reference-process stability checks, percentile-based binning, sequential sampling schedules, computation of

χ^{2}

and

\tilde{V}

, and the stopping rule. The package also supports simulation-based power analyses for selecting

(N_{1}, T)

under user-defined constraints. An introduction on how to use the package is also available in the repository.

3.11. Univariate Equivalency Based on Variation Limits

There are numerous potential ways to determine equivalency between two different distributions even beyond traditional goodness-of-fit tests. One alternative approach is based on tolerance-bound statistics such as

T_{99}

and

T_{1}

, which are used in specification-based qualification frameworks for metallic materials, including MMPDS-style procedures [27]. A

T_{99}

value is a one-sided lower tolerance limit, representing a

95 %

confidence interval lower limit on the first percentile of the distribution, which can be computed via parametric or nonparametric means ([27]). A

T_{1}

value is the upper tolerance limit of the

95 %

interval for the 99th percentile of that distribution. To establish equivalency, one simply measures these statistics for the reference and candidate distributions and see if the latter is nested within the former (Figure 7). If this is the case, the processes are equivalent; otherwise they are not equivalent. Significant work needs to be done to determine whether such an option is feasible.

Still, obvious comparisons can be made between this method and our bin-based method. In a sense, the bin-based method subsumes this variation limit method, as the bin-based method would be sensitive to differences in the mean, variability, and other moments of the reference and candidate distributions. Conversely, the proposed approach may be less sensitive to some mean shifts when the candidate distribution is substantially less variable than the reference distribution. This occurs because the method evaluates discrepancies through candidate counts in reference-defined percentile bins rather than through the mean alone. If the candidate mean is shifted but the candidate distribution is also relatively narrow, many observations may remain concentrated within a small number of central bins, which can limit the overall discrepancy in bin counts. In such cases, a change in location does not necessarily produce a strong equivalency signal unless enough observations are displaced across multiple bin boundaries.

4. Validation Case Study

We performed a set of experiments designed to test the efficacy of the univariate equivalency method. In these experiments, a single process is used to create a reference and candidate distribution while a second process is used to create a secondary candidate distribution. If the equivalency package works as intended, then the reference distribution and the first candidate distribution should be equivalent, while the reference and the second distribution should be different from the reference. We first discuss how the experiment was performed, and then we go through the steps highlighted in Figure 1 to demonstrate how the package should be used and the results of the case study.

4.1. Experimental Methods

Plate scans were selected for the validation of the univariate equivalency method proposed in this work. Unlike three-dimensional parts, plate scans consist of a single-layer build in which vectors or melt-pools are isolated for a layer-wise analysis. This approach also reduces the time required to obtain and measure samples. In this case, two LPBF systems, AconityMIDI+ and SLM280 HL, were used to produce the populations required for the univariate equivalency method. The AconityMIDI+ (Aconity3D GmbH, Herzogenrath, Germany) features a continuous-wave Yb-fiber laser (CFL-500, nLIGHT, Vancouver, WA, USA) with a maximum power of 500 watts, and controlled by a scanner head co-developed by Aconity3D and Raylase GmbH (Weßling, Germany). This system was selected to create the reference population since it offers the user extensive configurability of multiple parameters. The same machine was also used for the candidate population that, theoretically, should be defined as equivalent by the method proposed here. For the “non-equivalent” candidate population, the SLM280 HL (Nikon SLM Solutions AG, Lübeck, Germany) system was used. This machine is equipped with two 400-watt, continuous wave, Yb-fiber lasers and a SCANLAB intelliSCAN III 20 scanner controller (SCANLAB GmbH, Puchheim, Germany). In contrast to the AconityMIDI+, the SLM280 is a more closed system with less availability for parameter user configuration.

To guarantee a standardized process and eliminate the potential for external confounding variables, all tests were printed under the same conditions. Each plate was positioned at the center of the build plate of the corresponding system, with the same orientation relative to the flow direction. The surface of the plates was aligned to the focal plane in the z-axis. All plate scans were also produced under manufacturing conditions, maintaining the build chamber oxygen concentration below 1000 ppm.

Following the steps described in this publication, the reference population was produced by consecutively printing 20 subgroups (

m = 20

), or plates, of 2 samples each (

n = 2

). Each sample was manually positioned at the center of the build platform before activating the laser to continue with the next subgroup. For the candidate populations, the samples were printed according to the sampling strategy with the sequence

N_{t} = {5, 9, 14, 17, 21, 24, 27, 30, 32, 35, 36, 38, 39, 40}

. To expedite the data gathering process for purposes of validating the univariate equivalency method for this publication, the 40 samples for both candidate populations were printed with a 20-min time interval between each sample increment to accommodate the change of plates between runs. However, the data was not processed using the univariate equivalency method at this phase. By omitting intermediate data processing while keeping the planned validation framework, the overall experiment duration was greatly decreased. By omitting intermediate data-processing steps while preserving the planned validation framework, the overall experiment duration was reduced relative to a fully sequential implementation in which data would be processed and evaluated after each sampling stage. If such intermediate processing is retained, the framework can still reduce the total number of samples required, but the calendar-time savings may be smaller because each sequential step must be followed by measurement, analysis, and a stopping decision before additional samples are collected.

Once the plates were printed, a Keyence VHX-7000 4K microscope (Keyence Corporation, Osaka, Japan) was used to image the samples at 150× magnification. The geometry selected for the validation experiment consisted of a quadrilateral with angle values of 45°, 90°, and 135°. The Keyence VHX-7000 software was used to measure the corner deviation of the 135° corner. This measurement quantified the deviation between the scanned corner and the intersection of the centerlines of the vectors forming the corner, as illustrated in Figure 8. The centerlines of the vectors were created manually and used to generate a bisector guideline that crossed the inner and outer edges of the corner. Then the distance from the intersection of the centerlines to both edges of the corner vectors along the bisector guideline was measured, ensuring that both measurements were taken at the same angle. The corner deviation value was calculated by adding the measurements to both edges. Measurements falling within the bisector were assigned “positive” values, while those outside the bisector were assigned “negative” values. An ideal corner would have a corner deviation of zero, indicating that the centerline intersection is equidistant from the vector edges of the corner. Corner deviation was selected as the metric for this study because it can be influenced by polygon delay, a hidden parameter in some LPBF systems. Equivalency was determined using the available R Package [44].

4.2. Experimental Results

The corner deviations measured on the AconityMIDI+ were found to be stable according to the Shewhart control chart (Figure 9A). This ensures that the process can produce a viable reference distribution. Furthermore, sequentially sampled corner deviations from the AconityMIDI+ were shown to be equivalent to the reference distribution (

{\tilde{V}}_{T} = 0.207 < C I_{T}^{+} = 0.257

;

d f = b - 1 = 7

;

N_{T} = 40

; Table 2; Figure 9B). However, corner deviations measured from traces produced by the SLM280 HL were found to be non-equivalent by the 1st sample (

{\tilde{V}}_{1} = 1 > C I_{1}^{+} = 0.727

;

d f = b - 1 = 7

;

N_{1} = 5

; Table 2; Figure 9C). Thus, this univariate equivalency method successfully classified two equivalent processes as being equivalent while determining that a non-equivalent process was indeed non-equivalent to the reference at a small sample size. In applications where non-equivalency is detected, a natural next step is to inspect tail bins and potential outliers to identify whether the divergence reflects a systematic shift or rare special-cause excursions.

5. Discussion

One major challenge for the maturation of additive manufacturing as an industrial technology is its ability to scale [67]. While economies of scale are achievable for AM generally [5], there are a number of significant costs that make it expensive for mass production relative to traditional manufacturing techniques [10]. One major cost is the cost of testing for the purposes of qualification, certification, or in-house validation [68]. There is also a lack of standards for AM processed objects [69], resulting in huge variation in quality among additively manufactured parts. Our goal with developing this univariate equivalency workflow is to provide a reliable and cost-friendly approach for comparing a candidate process to a stable reference with respect to a single response variable. This workflow is implemented in an open-source R package (AMEquivalency) for reproducibility and ease of adoption [44]. We demonstrated the method using a validation case study based on geometric measurements of quadrilateral traces (plate scans), correctly classifying a candidate process expected to be equivalent to the reference and detecting a non-equivalent process at a small sample size. We also briefly discussed an alternative equivalency strategy based on tolerance limits [27], and argued that the bin-based method provides a more general distributional comparison that is sensitive to differences in multiple moments (mean, variability, and higher-order features).

While this study is motivated in part by qualification challenges in additive manufacturing, equivalency testing should not be viewed as a standalone replacement for existing qualification standards. Significant additional work is still needed to determine how distributional equivalency in specific metrics relates to part performance, reliability, and defect tolerance across applications, and how univariate, and ultimately multivariate, equivalency evidence should be incorporated into formal qualification and certification frameworks. Even so, distributional equivalency has important uses beyond formal qualification. Potential applications include machine-to-machine comparability, parameter-set change control, supplier comparisons, requalification after hardware or software updates, calibration of in situ monitoring proxies against ex situ measurements, and early-stage process development in which the goal is to match a known-good baseline while minimizing the number of test artifacts. The proposed workflow is especially valuable in settings where sample production, post-processing, or destructive evaluation is sufficiently expensive that early stopping can yield meaningful practical savings. In some LPBF workflows, however, many specimens can be produced in a single build and paired with high-throughput evaluation strategies, which reduces the marginal benefit of sequential sampling [43]. By contrast, the framework may be particularly useful for lower-throughput processes, expensive qualification artifacts, machine-to-machine comparability studies, requalification after process changes, or any setting in which only a few artifacts can be produced or evaluated at a time [70]. The LPBF case study presented here should therefore be interpreted primarily as a validation of the statistical workflow rather than as evidence that LPBF is universally the setting of greatest benefit. In these near-term contexts, the workflow is best positioned as a practical tool for process comparability and change control, supporting principled accept/reject decisions relative to a reference baseline while explicitly accounting for cost constraints through sequential sampling.

It should be noted that equivalency is not the only method which can be used to compare reference and candidate distributions. Alternative methods for comparing continuous distributions which do not require binning include empirical-distribution-function procedures such as the Kolmogorov–Smirnov (KS) test and distance-based methods such as Wasserstein metrics [71,72]. These approaches preserve the continuous scale of the data and can be more sensitive than a discretized method to certain fine-grained differences between distributions. However, they do not by themselves resolve the experimental-design problem addressed here: how to construct a sequential, interpretable, and cost-aware decision rule for small-sample AM equivalency testing. We therefore view such methods as useful complements rather than direct replacements for the present framework. To make this comparison explicit, we now include Appendix D, which compares the empirical power of the proposed method to that of the two-sample KS test under the same simulation logic used throughout the manuscript.

The validation case study provides more than a simple proof of implementation. Most importantly, it shows that the sequential workflow behaves differently in the two practically relevant regimes that motivated its development. For the candidate population produced on the same system as the reference, the method did not trigger an early non-equivalency decision and instead continued through the full planned sample size, indicating a conservative tendency when evidence remains consistent with equivalency. By contrast, the candidate population produced on the second LPBF system was rejected at the first sequential step, showing that large distributional departures can be detected with very little sampling effort. This asymmetric behavior is desirable in many qualification and change-control settings: strong evidence of non-equivalency can be identified quickly, whereas claims of equivalency require sustained agreement with the reference distribution across the full sampling sequence. At the same time, the case study should be interpreted as a validation of the statistical workflow rather than as a comprehensive validation of equivalency for LPBF more generally. The example is based on a single response metric derived from plate scans, and additional work will be needed to determine how equivalency in such metrics relates to broader questions of part performance, reliability, and certification.

This work also highlights several directions for future research. First, the present framework evaluates only a single attribute of a candidate build, underscoring the need for a multivariate extension that can assess equivalency across multiple correlated or independent attributes (i.e., tensile strength, surface roughness, geometric accuracy, etc.) simultaneously and thereby support a more holistic notion of process comparability. Such an extension will be important if equivalency testing is to move beyond specification-limited comparisons and toward broader qualification and certification applications. Second, the general framework developed here may be especially valuable in other additive manufacturing settings in which throughput is lower and qualification artifacts are more expensive, including processes such as laser directed energy deposition and extrusion-based AM. Extending the method to these contexts may help clarify where sequential sampling provides the greatest practical advantage, as here only one or a few parts may be built simultaneously, and thus could benefit from sequential sampling. Third, sequential sampling may also be leveraged to reduce sample-size requirements for establishing reference distributions when artifacts are extremely expensive, provided that process stability can still be assessed reliably. Finally, when non-equivalency is detected, follow-on diagnostic analyses—for example, targeted investigation of tail behavior or outliers as possible indicators of special-cause variation—may help identify why a candidate process diverged from the reference and guide corrective actions.

6. Conclusions

This study introduced a nonparametric framework for assessing univariate equivalency between additive-manufacturing processes through comparison of full empirical distributions rather than isolated summary statistics. By characterizing the reference process with percentile-defined bins and evaluating candidate data through a sequential sampling rule, the method provides an interpretable accept/reject workflow that is explicitly designed for settings in which sample production and evaluation are costly. The principal advantage of the framework is therefore not that it replaces broader qualification standards, but that it offers a practical and statistically grounded tool for process comparability, requalification, and change control when only limited sample sizes are feasible.

The results show that the method can distinguish between equivalent and non-equivalent candidate distributions while supporting early stopping when strong evidence of non-equivalence is present. More generally, the framework provides a way to compare processes on the basis of their observed distributions without requiring parametric assumptions about distributional form. This makes it attractive for AM applications in which normality may not hold, tail behavior may matter, and investigators wish to retain information about variability and distributional shape rather than relying only on means or specification-based checks.

At the same time, the method should be interpreted within its intended scope. It evaluates observed distributions for a single response metric and assumes that both process stability and measurement-system adequacy have been established before equivalency testing is applied. Accordingly, it is best viewed as a component of a broader validation or monitoring workflow rather than as a standalone replacement for qualification and certification standards. Its greatest practical benefit is expected in settings where artifacts are expensive, throughput is limited, and early stopping can reduce the number of required samples. In higher-throughput settings, the main value of the method may lie less in reducing calendar time and more in providing a principled and interpretable framework for process comparison.

Future work should extend this approach to multivariate equivalency, strengthen the treatment of measurement uncertainty, and further evaluate its applicability across additional AM settings, materials, and process conditions relevant to industrial qualification workflows. Even in its present form, however, the proposed workflow establishes a useful foundation for distribution-based process comparison in additive manufacturing and provides a practical step toward more rigorous and sample-efficient equivalency assessment in this field.

Author Contributions

C.M.L.: Conceptualization, Methodology, Software, Data curation, Writing—original draft, Visualization, Writing—review & editing. R.V.: Conceptualization, Methodology, Supervision, Validation, Writing—review & editing. B.L.V.M.: Writing—original draft, Experimental Design, Methodology, Investigation. C.G.G.: Software, Formal Analysis. J.M.: Writing—review & editing, Experimental Design, Methodology. R.B.W.: Investigation, Conceptualization, Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Funding

The research described here was performed at The University of Texas at El Paso (UTEP) within the W.M. Keck Center for 3D Innovation (Keck Center). This material is based on research sponsored by Air Force Research Laboratory under Agreement Number FA8650-20-2-5700. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Additional support was provided by strategic investments via discretionary UTEP Keck Center funds and the Mr. and Mrs. MacIntosh Murchison Chair I in Engineering Endowment at UTEP. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory or the U.S. Government.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data supporting the findings of this study are contained within this article.

Acknowledgments

Theauthors acknowledge the use of generative artificial intelligence for language editing and code development. During the preparation of this manuscript, the authors used ChatGPT (OpenAI, GPT-5.2, 2026) to assist with editing and refining text and for generating and debugging functions used in the computational analyses. The authors have reviewed and edited all AI-generated output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Colin Lynch was employed as a contractor by Terra Integrated Solutions. Author J. Rene Villalobos is the founder of Terra Integrated Solutions. Author Brenda Valadez was employed by Tailored Alloys. Author Jorge Mireles was employed by Additive Manufacturing Edcuation Partners. Author Ryan Wicker was employed by Additive Manufacturing Education Partners and is the co-founder of Trusted Metal. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AM	Additive Manufacturing
ASTM	ASTM International (formerly American Society for Testing and Materials)
CI	Confidence Interval
CUSUM	Cumulative Sum (control chart)
EWMA	Exponentially Weighted Moving Average (control chart)
ISO	International Organization for Standardization
LPBF	Laser Powder Bed Fusion
MMPDS	Metallic Materials Properties Development and Standardization (Handbook)
NASA	National Aeronautics and Space Administration
PBF-LB/M	Laser-Based Powder Bed Fusion of Metals
ppm	Parts per million
SPC	Statistical Process Control
$χ^{2}$	Chi-square (goodness-of-fit) statistic
$\tilde{V}$	Normalized Cramér’s V (goodness-of-fit effect size)

Appendix A. Bin Stability for Reference Distribution

To determine whether or not a sample size of

n m = N = 40

is sufficient to properly characterize the reference distribution (where n is the number of subgroups used to test for stability and m is the number of samples within a subgroup), we test whether or not the bin limits are sufficiently stable at this sample size for

b = 8

. That is, we want to know if the empirical percentiles of a sample of various distributions align well with the percentiles of the population at

N = 40

. To test this, we randomly sample from four common distribution types (exponential, lognormal, normal, and uniform) where sample sizes are

N = 8, 9, \dots, 100

. The parameters governing the shapes of these distributions are set randomly and are used to find the true percentiles of the populations. For each sample, we find the empirical percentiles and calculate the sum of squared errors between these percentiles and the true ones. We repeat this process 250 times for each sample size and report the average sum of squared errors.

We find that for each distribution the sum of squared errors exponentially decays with sample size (Figure A1). In each case, error starts to saturate near

N = 40

, so while error can be further decreased with larger sample sizes, the improvement exhibits diminishing returns.

Figure A1. Sum of squared errors between empirical and true bin limits for different distribution types across various sample sizes. The dashed line represents a sample size of 40.

Appendix B. Continuously Monitoring the Reference Process

After the candidate process has been qualified, the reference process may still be used for further production (Figure A2). In this case, the process should continuously be monitored via control charts, and future sample points can be concatenated to existing points to extend the control chart. If the process remains in control, these datapoints can be pooled into the existing dataset to update the bin limits, although these limits will stabilize at larger sample sizes.

Figure A2. Flow diagram for how to include new data for the reference distribution, modified from Figure 1A to reflect the addition of new data from the reference process.

Appendix C. Equivalency Testing Versus Specification

Qualification of additive manufacturing processes can follow different statistical frameworks, with two prominent approaches being specification-based methods, as codified in MMPDS-16 Chapter 9 [27], and the type of equivalency testing we propose here. Specification-based qualification focuses on verifying that a candidate process meets minimum performance requirements relative to a reference baseline. In practice, this often involves constructing tolerance or confidence intervals for strength or other material properties and ensuring that these intervals lie above predefined thresholds. This method has the advantages of simplicity and regulatory familiarity, providing a clear pass–fail criterion that assures minimum material performance. Its limitation, however, is that it is insensitive to broader distributional features: two processes may both exceed the threshold yet differ significantly in their variability, defect structure, or tail behavior, all of which are critical for long-term reliability and fatigue performance. This framework also tends to ignore the stability of a process.

By contrast, equivalency testing shifts the focus from minimum thresholds to distributional similarity. Here, the goal is to determine whether the candidate process produces parts statistically indistinguishable from those produced by a trusted reference process. Instead of verifying that the candidate meets a minimum requirement, equivalency requires that the full distribution of relevant properties (central tendency, spread, and higher-order moments) align with that of the reference, although these moments do not need to be calculated and compared directly. Our nonparametric approach implicitly captures these distributional features. This framework is particularly powerful when qualification depends not only on avoiding failures but also on replicating established performance envelopes. The cost of this rigor is that equivalency testing is more computationally demanding, requires larger or more carefully structured sample sets, and may reject processes that are “good enough” but differ in distributional form.

Importantly, equivalency can be seen as subsuming specification: if two distributions are shown to be equivalent, then it necessarily follows that the candidate meets or exceeds the minimum specification satisfied by the reference. However, the reverse does not hold; meeting a threshold specification does not imply process equivalency, since substantial distributional differences may remain undetected.

Figure A3. Comparison of qualification strategies using equivalency testing and MMPDS Chapter 9 specification procedures. (A) Distribution of the counts of the final sample size using sequential sampling and univariate equivalency. (B) Kernel densities of a reference distribution (blue) and a candidate distribution (red). The lines show the

T_{99}

and

T_{1}

statistics for each distribution.

Figure A3. Comparison of qualification strategies using equivalency testing and MMPDS Chapter 9 specification procedures. (A) Distribution of the counts of the final sample size using sequential sampling and univariate equivalency. (B) Kernel densities of a reference distribution (blue) and a candidate distribution (red). The lines show the

T_{99}

and

T_{1}

statistics for each distribution.

We further test the capabilities of both approaches by sampling from additional types of distributions. For each family

F \in {Normal, Lognormal, Gamma, Weibull}

we fix a reference model with mean

μ_{R}

and standard deviation

σ_{R}

. Candidate models are constructed by shifting only the mean to

μ_{C} = μ_{R} + δ σ_{R}

for

δ \in {0, 0.2, \dots, 2.0}

while preserving the family’s shape parameters.

Figure A4. Comparison of Chapter 9 qualification and the proposed univariate equivalency method across Lognormal, Normal, Gamma, and Weibull candidate distributions as the candidate mean is shifted from the reference mean. (A) Mean power of each method across the four distribution families. (B) Mean false positive rate of each method across the four distribution families; the horizontal dotted line indicates the set significance level. (C) Mean sample size required by the proposed univariate equivalency method as a function of the shift from the reference mean, with colored lines denoting the four distribution families; the black dashed line indicates the fixed Chapter 9 sample size. (D) Probability of concluding non-equivalence as a function of the shift from the reference mean for each distribution family. In panel (D), the solid red line denotes Chapter 9 and the dashed blue line denotes the proposed univariate equivalency method; the horizontal dotted line indicates the set significance level.

Appendix D. Comparison with the Kolmogorov–Smirnov Test

Univariate equivalency is only one method which can be used to compare whole distributions. Other methods exist as well which do not require reducing the resolution of the distributions in order to compare the reference and candidate populations. We therefore compare the power of our percentile-bin method with that of the two-sample Kolmogorov–Smirnov (KS) test under a common simulation framework. This comparison is useful because the KS test is one of the most widely used nonparametric procedures for comparing two continuous samples, and therefore serves as a natural benchmark for positioning the proposed approach relative to a familiar goodness-of-fit method.

The KS test compares the empirical cumulative distribution functions (ECDFs) of the reference and candidate samples. Its test statistic is

D_{n, m} = sup_{x} |F_{n} (x) - G_{m} (x)|,

(A1)

where

F_{n} (x)

and

G_{m} (x)

are the ECDFs of the reference and candidate samples, respectively. Thus, the KS test rejects the null hypothesis when the largest vertical discrepancy between the two ECDFs is sufficiently large. In contrast, the proposed univariate equivalency method first characterizes the reference distribution with empirical percentile bins and then evaluates whether the candidate observations are distributed uniformly across those bins, as expected under equivalency. In other words, the KS test is driven by a single maximal discrepancy, whereas the proposed method aggregates departures from equivalency across the full set of bins.

To make the comparison commensurate with the rest of the manuscript, we retained the same baseline sample-size logic used throughout the paper. The reference sample size was fixed at

N_{ref} = 40

, consistent with the recommendation that

N = 40

is the smallest practical sample size that simultaneously supports stability assessment and an 8-bin characterization of the reference distribution. The candidate sample size was also fixed at

N_{cand} = 40

, and the proposed method used

b = 8

bins. Candidate distributions were then generated by shifting the mean of the reference distribution by

δ σ_{R}

, where

δ

is measured in units of the reference standard deviation, while also allowing a controlled number of extreme outliers to be inserted into the candidate sample.

Figure A5 shows the resulting power surfaces. Panels A and B report the empirical power of the proposed method and the KS test, respectively, and Panel C reports their difference, with green regions indicating scenarios in which the proposed method achieved higher power than the KS test. Across the simulation settings considered here, the percentile-bin method often showed greater power than the KS test, especially when the candidate distribution differed from the reference through a combination of moderate mean shift and contamination by a limited number of extreme observations.

Figure A5. Comparison of the power of the proposed univariate equivalency method and the two-sample Kolmogorov–Smirnov test. Panel (A) shows the empirical power of the proposed method, Panel (B) shows the empirical power of the KS test, and Panel (C) shows the difference in power between the two methods (green indicates higher power for the proposed method; red indicates higher power for the KS test). The horizontal axis gives the number of outliers inserted into the candidate sample, and the vertical axis gives the mean shift of the candidate distribution relative to the reference in units of the reference standard deviation.

There are several statistical reasons why this pattern may occur. First, the KS test is based on the supremum norm: it only uses the single largest gap between the two ECDFs. When the difference between the reference and candidate distributions is spread across multiple regions of the support, or when the discrepancy is concentrated in the tails but does not create a very large maximum ECDF gap, the KS statistic may not increase substantially. By contrast, the proposed method pools information across all bins, so multiple moderate deviations can accumulate into a stronger global signal. Second, the bins are defined by empirical reference percentiles, so each bin is expected to have equal probability under equivalency. This creates a balanced representation of the support of the reference distribution and can make shifts in either central tendency or tail occupancy easier to detect through coordinated count imbalances. Third, discretization reduces the effective complexity of the comparison. At the sample sizes considered here, collapsing the continuous comparison into a modest number of equal-probability bins can lower variance enough to improve detection of practically relevant departures from equivalency. Finally, the KS test is an omnibus procedure designed to be sensitive to a very broad class of alternatives; this distribution-free generality can come at the cost of lower power for specific alternatives that are especially relevant in AM qualification problems, such as moderate process shifts accompanied by sparse but consequential tail events.

These results should not be interpreted as showing that the proposed method is uniformly more powerful than the KS test under all alternatives. Rather, they indicate that for the specific equivalency problem studied here—small-to-moderate sample sizes, reference-informed percentile binning, and alternatives motivated by process shifts and outlier contamination—the proposed method can provide a substantial gain in discriminatory power. This is consistent with our broader motivation: qualification and requalification in additive manufacturing often require methods that are robust, distribution-agnostic, and sensitive to practically meaningful deviations in the full response distribution rather than only to differences in a single summary statistic.

References

Alcisto, J.; Enriquez, A.; Garcia, H.; Hinkson, S.; Steelman, T.; Silverman, E.; Valdovino, P.; Gigerenzer, H.; Foyos, J.; Ogren, J.; et al. Tensile Properties and Microstructures of Laser-Formed Ti-6Al-4V. J. Mater. Eng. Perform. 2011, 20, 203–212. [Google Scholar] [CrossRef]
Srivatsan, T.S.; Sudarshan, T.S. Additive Manufacturing: Innovations, Advances, and Applications; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar]
Hasanov, S.; Alkunte, S.; Rajeshirke, M.; Gupta, A.K.; Huseynov, O.; Fidan, I.; Alifui-Segbaya, F.; Rennie, A.E.W. Review on Additive Manufacturing of Multi-Material Parts: Progress and Challenges. J. Manuf. Mater. Process. 2022, 6, 4. [Google Scholar] [CrossRef]
Conner, B.P.; Manogharan, G.P.; Martof, A.N.; Rodomsky, L.M.; Rodomsky, C.M.; Jordan, D.C.; Limperos, J.W. Making Sense of 3D Printing: Creating a Map of Additive Manufacturing Products and Services. Addit. Manuf. 2014, 1, 64–76. [Google Scholar]
Baumers, M.; Dickens, P.; Tuck, C.; Hague, R. The cost of additive manufacturing. Technol. Forecast. Soc. Change 2016, 102, 193–201. [Google Scholar] [CrossRef]
Hopkinson, N.; Dickens, P.M. Analysis of Rapid Manufacturing—Using Layer Manufacturing Processes for Production. Proc. Inst. Mech. Eng. C 2003, 217, 31–39. [Google Scholar] [CrossRef]
Attaran, M. The Rise of 3-D Printing: The Advantages of Additive Manufacturing over Traditional Manufacturing. Bus. Horiz. 2017, 60, 677–688. [Google Scholar] [CrossRef]
Newman, S.T.; Zhu, Z.; Dhokia, V.; Shokrani, A. Process Planning for Additive and Subtractive Manufacturing Technologies. CIRP Ann. 2015, 64, 467–470. [Google Scholar] [CrossRef]
Dowling, L.; Kennedy, J.; O’Shaughnessy, S.M.; Trimble, D. A Review of Critical Repeatability and Reproducibility Issues in Powder Bed Fusion. Mater. Des. 2020, 186, 108346. [Google Scholar] [CrossRef]
Pereira, T.; Kennedy, J.V.; Potgieter, J. A Comparison of Traditional Manufacturing vs. Additive Manufacturing, the Best Method for the Job. Procedia Manuf. 2019, 30, 11–18. [Google Scholar] [CrossRef]
Martínez-García, A.; Monzón, M.; Paz, R. Standards for Additive Manufacturing Technologies: Structure and Impact. In Additive Manufacturing; Pou, J., Riveiro, A., Davim, J.P., Eds.; Elsevier: Amsterdam, The Netherlands, 2021; pp. 395–408. [Google Scholar]
Seifi, M.; Gorelik, M.; Waller, J.M.; Hrabe, N.; Shamsaei, N.; Daniewicz, S.R.; Lewandowski, J.J. Progress Towards Metal Additive Manufacturing Standardization to Support Qualification and Certification. JOM 2017, 69, 439–455. [Google Scholar] [CrossRef]
America Makes; ANSI. Standardization Roadmap for Additive Manufacturing; Version 3.0; American National Standards Institute: New York, NY, USA; National Center for Defense Manufacturing and Machining (America Makes): Youngstown, OH, USA, 2023. [Google Scholar]
Adam, G.A.O.; Zimmer, D. Design for additive manufacturing. CIRP J. Manuf. Sci. Technol. 2014, 7, 20–28. [Google Scholar] [CrossRef]
Gibson, I.; Rosen, D.; Stucker, B.; Khorasani, M. Additive Manufacturing Technologies; Springer: Cham, Switzerland, 2021. [Google Scholar]
NASA-STD-6030; Additive Manufacturing Requirements for Spaceflight Systems. National Aeronautics and Space Administration: Washington, DC, USA, 2021.
NASA-STD-6033; Additive Manufacturing Requirements for Equipment and Facility Control. National Aeronautics and Space Administration: Washington, DC, USA, 2021.
ISO/ASTM 52941:2020; Additive Manufacturing—System Performance and Reliability—Acceptance Tests for Laser Metal Powder-Bed Fusion Machines for Metallic Materials for Aerospace Application. ASTM International: West Conshohocken, PA, USA, 2020.
ISO/ASTM 52930:2021; Additive Manufacturing—Qualification Principles—Installation, Operation and Performance (IQ/OQ/PQ) of PBF-LB Equipment. ASTM International: West Conshohocken, PA, USA, 2021.
AMS7032; Machine Qualification for Fusion-Based Metal Additive Manufacturing. SAE International: Warrendale, PA, USA, 2022.
AMS7003A; Laser Powder Bed Fusion Process. SAE International: Warrendale, PA, USA, 2022.
Donmez, A.; Fox, J.; Kim, F.; Lane, B.; Praniewicz, M.; Tondare, V.; Weaver, J.; Witherell, P. In-Process Monitoring and Non-Destructive Evaluation for Metal Additive Manufacturing Processes; NIST Interagency/Internal Report (NISTIR)-8538; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2024. [CrossRef]
Hocker, S.J.A.; Zalameda, J.N.; Richter, B.; Spaeth, P.W.; Kitahara, A.R.; Glaessgen, E.H. Powder Bed Fusion Laser Beam Metals Additive Manufacturing: Process Monitoring Approaches for Qualification and Certification; NASA Technical Reports Server Presentation, Document ID 20230007109; National Aeronautics and Space Administration: Washington, DC, USA, 2023.
Chua, C.; Liu, Y.; Williams, R.J.; Chua, C.K.; Sing, S.L. In-process and post-process strategies for part quality assessment in metal powder bed fusion: A review. J. Manuf. Syst. 2024, 73, 75–105. [Google Scholar] [CrossRef]
Cai, Y.; Xiong, J.; Chen, H.; Zhang, G. A review of in-situ monitoring and process control system in metal-based laser additive manufacturing. J. Manuf. Syst. 2023, 70, 309–326. [Google Scholar] [CrossRef]
Lindecke, P.N.J.; Diaz del Castillo, J.M.; Tarhini, H. A Universal Method for the Evaluation of In Situ Process Monitoring Systems in the Laser Powder Bed Fusion Process. J. Manuf. Mater. Process. 2025, 9, 359. [Google Scholar] [CrossRef]
MMPDS-16; Metallic Materials Properties Development and Standardization. Battelle Memorial Institute: Columbus, OH, USA, 2021.
Chen, Z.; Han, C.; Gao, M.; Kandukuri, S.Y.; Zhou, K. A Review on Qualification and Certification for Metal Additive Manufacturing. Virtual Phys. Prototyp. 2022, 17, 382–405. [Google Scholar] [CrossRef]
Dixon, P.M.; Saint-Maurice, P.F.; Kim, Y.; Hibbing, P.; Bai, Y.; Welk, G.J. A Primer on the Use of Equivalence Testing for Evaluating Measurement Agreement. Med. Sci. Sport. Exerc. 2018, 50, 837–845. [Google Scholar] [CrossRef] [PubMed]
Borman, P.J.; Chatfield, M.J.; Damjanov, I.; Jackson, P. Design and Analysis of Method Equivalence Studies. Anal. Chem. 2009, 81, 9849–9857. [Google Scholar] [CrossRef]
Meyners, M. Equivalence Tests—A Review. Food Qual. Prefer. 2012, 26, 231–245. [Google Scholar] [CrossRef]
Anderson-Cook, C.M.; Borror, C.M. The Difference between “Equivalent” and “Not Different”. Qual. Eng. 2016, 28, 249–262. [Google Scholar] [CrossRef]
Park, H.M. Hypothesis Testing and Statistical Power of a Test; Working Paper; Indiana University: Bloomington, IN, USA, 2015. [Google Scholar]
Kingman, A. Statistical vs. Clinical Significance in Product Testing: Can They Be Designed to Satisfy Equivalence? J. Public Health Dent. 1992, 52, 353–360. [Google Scholar] [CrossRef]
Wellek, S. Testing Statistical Hypotheses of Equivalence; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar]
Walker, E.; Nowacki, A.S. Understanding Equivalence and Noninferiority Testing. J. Gen. Intern. Med. 2011, 26, 192–196. [Google Scholar] [CrossRef] [PubMed]
Rogers, J.L.; Howard, K.I.; Vessey, J.T. Using Significance Tests to Evaluate Equivalence Between Two Experimental Groups. Psychol. Bull. 1993, 113, 553–565. [Google Scholar] [CrossRef] [PubMed]
Chatfield, M.J.; Borman, P.J. Acceptance Criteria for Method Equivalency Assessments. Anal. Chem. 2009, 81, 9841–9848. [Google Scholar] [CrossRef]
O’Brien, M.W. Implications and Recommendations for Equivalence Testing in Measures of Movement Behaviors: A Scoping Review. J. Meas. Phys. Behav. 2021, 4, 353–362. [Google Scholar] [CrossRef]
Pardo, S. Equivalence and Noninferiority Tests for Quality; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Tapia, G.; Elwany, A.H. A Review on Process Monitoring and Control in Metal-Based Additive Manufacturing. J. Manuf. Sci. Eng. 2014, 136, 060801. [Google Scholar] [CrossRef]
Brot, A. Weibull or Log-Normal Distribution to Characterize Fatigue Life Scatter—Which Is More Suitable? In ICAF 2019—Structural Integrity in the Age of Additive Manufacturing; Niepokolczycki, A., Komorowski, J., Eds.; Springer: Cham, Switzerland, 2020; pp. 551–561. [Google Scholar]
Huang, K.; Kain, C.; Diaz-Vallejo, N.; Sohn, Y.; Zhou, L. High throughput mechanical testing platform and application in metal additive manufacturing and process optimization. J. Manuf. Process. 2021, 66, 494–505. [Google Scholar] [CrossRef]
Lynch, C.M. Univariate Equivalency R Package. Available online: https://github.com/colinmichaellynch/AMEquivalency (accessed on 9 March 2026).
DebRoy, T.; Wei, H.L.; Zuback, J.S.; Mukherjee, T.; Elmer, J.W.; Milewski, J.O.; Beese, A.M.; Wilson-Heid, A.E.; De, A.; Zhang, W. Additive Manufacturing of Metallic Components—Process, Structure and Properties. Prog. Mater. Sci. 2018, 92, 112–224. [Google Scholar] [CrossRef]
Gamboa Tiscareno, E.; Taylor, H.C.; Lynch, C.M.; Villalobos, J.R.; Wicker, R.B. Sensitivity of Mechanical Properties to Processing Defects: Is Tensile Testing an Appropriate Metric for Laser Beam Metal Powder Bed Fusion Machine Qualification? Addit. Manuf. 2024, 93, 104367. [Google Scholar] [CrossRef]
Cheng, S.W.; Thaga, K. Single variables control charts. Qual. Reliab. Eng. Int. 2006, 22, 811–820. [Google Scholar] [CrossRef]
Moen, R.D.; Nolan, T.W.; Provost, L.P. Improving Quality Through Planned Experimentation; Jossey-Bass: San Francisco, CA, USA, 1991. [Google Scholar]
He, K.; Zhang, Q.; Hong, Y. Profile Monitoring Based Quality Control Method for Fused Deposition Modeling Process. J. Intell. Manuf. 2019, 30, 947–958. [Google Scholar] [CrossRef]
Montgomery, D.C. Introduction to Statistical Quality Control, 8th ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
Roberts, S.W. Comparison of control chart procedures. Technometrics 1966, 8, 411–430. [Google Scholar] [CrossRef]
Haridy, S.; Maged, A.; Alherimi, N.; Shamsuzzaman, M.; Al-Ali, S. Optimization Design of Control Charts: A Systematic Review. Qual. Reliab. Eng. Int. 2024, 40, 2122–2157. [Google Scholar] [CrossRef]
Nelson, L.S. Shewhart control chart tests. J. Qual. Technol. 1984, 16, 237–239. [Google Scholar] [CrossRef]
Delignette-Muller, M.L.; Dutang, C. Fitdistrplus package. J. Stat. Softw. 2015, 64, 1–34. [Google Scholar] [CrossRef]
Zhong, X.; Yuan, K.-H. Bias in structural equation modeling. Multivar. Behav. Res. 2011, 46, 229–265. [Google Scholar] [CrossRef]
Rolke, W.; Gutierrez Gongora, C. A Chi-Square Goodness-of-Fit Test for Continuous Distributions against a Known Alternative. Comput. Stat. 2021, 36, 1885–1900. [Google Scholar] [CrossRef]
Nelson, L.S. Control charts for individual measurements. J. Qual. Technol. 1982, 14, 172–173. [Google Scholar] [CrossRef]
Haridy, S.; Maged, A.; Kaytbay, S.; Araby, S. Effect of Sample Size on the Performance of Shewhart Control Charts. Int. J. Adv. Manuf. Technol. 2017, 90, 1177–1185. [Google Scholar] [CrossRef]
Mohammed, M.A.; Panesar, J.S.; Laney, D.B. Statistical Process Control Charts for Attribute Data Involving Very Large Sample Sizes: A Review of Problems and Solutions. BMJ Qual. Saf. 2013, 22, 362–368. [Google Scholar] [CrossRef]
Sun, S.; Pan, W.; Wang, L.L. A Comprehensive Review of Effect Size Reporting and Interpreting Practices in Academic Journals in Education and Psychology. J. Educ. Psychol. 2010, 102, 989–1004. [Google Scholar] [CrossRef]
Gabaix, X. Power laws in economics. J. Econ. Perspect. 2016, 30, 185–206. [Google Scholar] [CrossRef]
Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
Rousseeuw, P.J.; Hubert, M. Anomaly detection by robust statistics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1236. [Google Scholar] [CrossRef]
Leys, C.; Klein, O.; Dominicy, Y.; Ley, C. Detecting Multivariate Outliers: Use a Robust Variant of the Mahalanobis Distance. J. Exp. Soc. Psychol. 2018, 74, 150–156. [Google Scholar] [CrossRef]
Chattopadhyay, S.; Das, K.; Sen, S.; Saha, B. Fitting Truncated Geometric Distributions in Large Scale Real World Networks. Theor. Comput. Sci. 2014, 551, 22–38. [Google Scholar] [CrossRef]
Del Castillo, E.; Montgomery, D.C.; McCarville, D.R. Modified desirability functions. J. Qual. Technol. 1996, 28, 337–345. [Google Scholar] [CrossRef]
Huang, W.D. How to Treat Additive Manufacturing (3D Printing) from a Rational Perspective. Adv. Mater. Ind. 2013, 1, 9–12. [Google Scholar]
Khanna, N.; Salvi, H.; Karaş, B.; Fairoz, I.; Shokrani, A. Cost Modelling for Powder Bed Fusion and Directed Energy Deposition Additive Manufacturing. J. Manuf. Mater. Process. 2024, 8, 142. [Google Scholar] [CrossRef]
Kumar, M.B.; Sathiya, P. Methods and Materials for Additive Manufacturing: A Critical Review on Advancements and Challenges. Thin-Walled Struct. 2021, 159, 107228. [Google Scholar] [CrossRef]
Tomblin, J.; Andrulonis, R.; Lovingfoss, R.; White, J.; Tay, N. Qualification of Additively Manufactured Material Extrusion Thermoplastic and Lessons Learned; FAA Report DOT/FAA/TC-24/17; Federal Aviation Administration, William J. Hughes Technical Center: Atlantic City, NJ, USA, 2024. [CrossRef]
Massey, F.J., Jr. The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
Ramdas, A.; García Trillos, N.; Cuturi, M. On Wasserstein two-sample testing and related families of nonparametric tests. Entropy 2017, 19, 47. [Google Scholar] [CrossRef]

Figure 1. (A) Process for establishing the reference distribution. (B) Process for determining whether or not the candidate process produces a distribution of the performance metric that is equivalent to that of the reference distribution produced in A. Here, T is the maximum number of samples that the experimentalist is willing to take, and

N_{1}

is the sample size of the first sample.

Figure 1. (A) Process for establishing the reference distribution. (B) Process for determining whether or not the candidate process produces a distribution of the performance metric that is equivalent to that of the reference distribution produced in A. Here, T is the maximum number of samples that the experimentalist is willing to take, and

N_{1}

is the sample size of the first sample.

Figure 2. Example Shewhart control chart. Each point represents the mean of a subgroup, the middle line represents the global mean, and the two dotted lines represent the upper and lower control limits.

Figure 3. Relationship between b and the correction factor

A (b)

. As b increases,

A (b)

approaches 1 (dashed line).

Figure 3. Relationship between b and the correction factor

A (b)

. As b increases,

A (b)

approaches 1 (dashed line).

Figure 4. Examples of hypothesized sampling strategies for different cost functions. The solid line shows how cost scales with sample size, while the dashed lines show the sizes of sequential samples.

Figure 5. Example simulations for determining equivalency. (A) Probability distribution across bins for an equivalent candidate process, where the probability that an observation falls in each bin is equal. When

ρ

approaches 0,

p_{1} \approx p_{2} \approx \dots \approx p_{b} \approx 1 / b

. (B) Probability distribution across bins for a non-equivalent candidate process, where

0 < ρ \leq 1

, so

p_{1} \neq p_{2} \neq \dots \neq p_{b}

. (C) Sequential sampling results for the equivalent case shown in panel (A). The solid line shows the observed value of the test statistic across sampling steps, while the dashed line shows the confidence interval bound. Because all sampled values remain within the bound, the candidate and reference processes are concluded to be equivalent. (D) Sequential sampling results for the non-equivalent case shown in panel (B). The solid line shows the observed value of the test statistic across sampling steps, while the dashed line shows the confidence interval bound. Because the second sample falls outside the bound, the candidate and reference processes are concluded to be not equivalent.

Figure 5. Example simulations for determining equivalency. (A) Probability distribution across bins for an equivalent candidate process, where the probability that an observation falls in each bin is equal. When

ρ

approaches 0,

p_{1} \approx p_{2} \approx \dots \approx p_{b} \approx 1 / b

. (B) Probability distribution across bins for a non-equivalent candidate process, where

0 < ρ \leq 1

, so

p_{1} \neq p_{2} \neq \dots \neq p_{b}

. (C) Sequential sampling results for the equivalent case shown in panel (A). The solid line shows the observed value of the test statistic across sampling steps, while the dashed line shows the confidence interval bound. Because all sampled values remain within the bound, the candidate and reference processes are concluded to be equivalent. (D) Sequential sampling results for the non-equivalent case shown in panel (B). The solid line shows the observed value of the test statistic across sampling steps, while the dashed line shows the confidence interval bound. Because the second sample falls outside the bound, the candidate and reference processes are concluded to be not equivalent.

Figure 6. The x-axis in each panel gives different sampling strategies represented as

N_{1} : T

. 3 of the panels represent how well each sampling strategy performs with respect to a particular performance metric. The final panel D gives the overall desirability (D) of these 3 variables. The maximum occurs at

N_{1} : T = 5 : 14

.

Figure 6. The x-axis in each panel gives different sampling strategies represented as

N_{1} : T

. 3 of the panels represent how well each sampling strategy performs with respect to a particular performance metric. The final panel D gives the overall desirability (D) of these 3 variables. The maximum occurs at

N_{1} : T = 5 : 14

.

Figure 7. Illustration of an alternative method of determining equivalency using experimental data from Section 4.1. Here, we measure the

T_{99}

and

T_{1}

of the reference distribution (visualized with a kernel density function), which are given by the vertical dashed lines. We then measure these same statistics for the candidate distributions (green and red intervals). In our case study, the candidate distribution produced by the AconityMIDI+ is equivalent to that of the reference process (green interval), but the candidate distribution produced by SLM280 HL is not (red interval).

Figure 7. Illustration of an alternative method of determining equivalency using experimental data from Section 4.1. Here, we measure the

T_{99}

and

T_{1}

of the reference distribution (visualized with a kernel density function), which are given by the vertical dashed lines. We then measure these same statistics for the candidate distributions (green and red intervals). In our case study, the candidate distribution produced by the AconityMIDI+ is equivalent to that of the reference process (green interval), but the candidate distribution produced by SLM280 HL is not (red interval).

Figure 8. Quadrilateral traces used for corner deviation measurements. (A) Full subgroup image with vector center lines. (B) Direction of positive and negative deviations from center lines. (C) Graphical representation of the corner deviation measurement.

Figure 9. (A) Shewhart

\bar{x}

control chart for the reference process. All subgroup means fall within the control limits and no run rules are violated, indicating that the process is stable and suitable for characterizing the reference distribution. (B) Final bin counts obtained from sequential sampling of a candidate process that is equivalent to the reference. Observations are approximately uniformly distributed across the percentile-defined bins, yielding

{\tilde{V}}_{T} < C I_{T}^{+}

and therefore no stopping event. (C) Final bin counts for a candidate process that is not equivalent to the reference. Counts are strongly concentrated in a subset of bins, producing

{\tilde{V}}_{1} \geq C I_{1}^{+}

and triggering early termination of sampling.

Figure 9. (A) Shewhart

\bar{x}

control chart for the reference process. All subgroup means fall within the control limits and no run rules are violated, indicating that the process is stable and suitable for characterizing the reference distribution. (B) Final bin counts obtained from sequential sampling of a candidate process that is equivalent to the reference. Observations are approximately uniformly distributed across the percentile-defined bins, yielding

{\tilde{V}}_{T} < C I_{T}^{+}

and therefore no stopping event. (C) Final bin counts for a candidate process that is not equivalent to the reference. Counts are strongly concentrated in a subset of bins, producing

{\tilde{V}}_{1} \geq C I_{1}^{+}

and triggering early termination of sampling.

Table 1. Rules for Shewhart control charts. Meeting one or more indicates the process is not in control.

σ

= one standard deviation.

Table 1. Rules for Shewhart control charts. Meeting one or more indicates the process is not in control.

σ

= one standard deviation.

Test	Rule	Problem Indicated
1	1 point outside control limits.	Large shift.
2	8/9 points on the same side of center line.	Small sustained shift.
3	6 points steadily increasing or decreasing.	Trend or drift.
4	14 points alternating up and down.	Systematic variation.
5	2 of 3 points $> 2 σ$ on same side.	Medium shift.
6	4 of 5 points $> 1 σ$ on same side.	Small shift.
7	15 points within $1 σ$ of center line.	Stratification.
8	8 points on both sides, none within $1 σ$ .	Mixture pattern.

Table 2. Sequential sampling results for equivalent and non-equivalent candidate processes.

Case	t	N	$χ^{2}$	$\tilde{V}$	${CI}_{t}^{+}$	Decision
Equivalent	1	5	9.40	0.518	0.727	Continue
Equivalent	2	9	11.4	0.426	0.542	Continue
Equivalent	3	14	11.1	0.337	0.434	Continue
Equivalent	4	17	10.8	0.301	0.394	Continue
Equivalent	5	21	12.1	0.287	0.355	Continue
Equivalent	6	24	12.7	0.275	0.332	Continue
Equivalent	7	27	8.85	0.216	0.313	Continue
Equivalent	8	30	9.47	0.212	0.297	Continue
Equivalent	9	32	10.0	0.211	0.287	Continue
Equivalent	10	35	11.4	0.216	0.275	Continue
Equivalent	11	36	10.7	0.206	0.271	Continue
Equivalent	12	38	11.3	0.206	0.264	Continue
Equivalent	13	39	10.8	0.199	0.260	Continue
Equivalent	14	40	12.0	0.207	0.257	Stop (final sample)
Non-equivalent	1	5	35.0	1.000	0.727	Stop (non-equivalent)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lynch, C.M.; Villalobos, R.; Valadez Mesta, B.L.; Gomez Guillen, C.; Mireles, J.; Wicker, R.B. Determining Univariate Equivalency of Additively Manufactured Parts. J. Manuf. Mater. Process. 2026, 10, 134. https://doi.org/10.3390/jmmp10040134

AMA Style

Lynch CM, Villalobos R, Valadez Mesta BL, Gomez Guillen C, Mireles J, Wicker RB. Determining Univariate Equivalency of Additively Manufactured Parts. Journal of Manufacturing and Materials Processing. 2026; 10(4):134. https://doi.org/10.3390/jmmp10040134

Chicago/Turabian Style

Lynch, Colin M., Rene Villalobos, Brenda Leticia Valadez Mesta, Cesar Gomez Guillen, Jorge Mireles, and Ryan B. Wicker. 2026. "Determining Univariate Equivalency of Additively Manufactured Parts" Journal of Manufacturing and Materials Processing 10, no. 4: 134. https://doi.org/10.3390/jmmp10040134

APA Style

Lynch, C. M., Villalobos, R., Valadez Mesta, B. L., Gomez Guillen, C., Mireles, J., & Wicker, R. B. (2026). Determining Univariate Equivalency of Additively Manufactured Parts. Journal of Manufacturing and Materials Processing, 10(4), 134. https://doi.org/10.3390/jmmp10040134

Article Menu

Determining Univariate Equivalency of Additively Manufactured Parts

Abstract

1. Introduction

2. Characterizing the Reference Distribution

2.1. Establishing Process Stability with Control Charts

2.2. Dividing Reference Distribution into Discrete Bins

2.3. Setting the Sample Size for the Reference Distribution

3. Optimal Sequential Sampling for Equivalency

3.1. χ 2 Statistic Is Invariant to Sample Size

3.2. Cramér’s V Varies Predictably with Sample Size

3.3. Defining the Sampling Function, Information Gain, and Marginal Costs of Sequential Samples

3.4. Defining the Objective Function to Trade off Information Gain with Marginal Cost

3.5. Setting Constraints on s

3.6. Minimizing L in the Acceptable Domain of s

3.7. Stopping Rule for Determining Equivalency

3.8. Outliers as a Complementary Diagnostic, a Recommended Secondary Analysis

3.9. Finding Optimal Values at T and N 1 with Power Analyses

3.9.1. Simulation Description

3.9.2. Finding Optimal Sampling Strategy

3.10. Summarized Steps of Equivalency

3.11. Univariate Equivalency Based on Variation Limits

4. Validation Case Study

4.1. Experimental Methods

4.2. Experimental Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Bin Stability for Reference Distribution

Appendix B. Continuously Monitoring the Reference Process

Appendix C. Equivalency Testing Versus Specification

Appendix D. Comparison with the Kolmogorov–Smirnov Test

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. $χ^{2}$ Statistic Is Invariant to Sample Size

3.6. Minimizing $L$ in the Acceptable Domain of s

3.9. Finding Optimal Values at T and $N_{1}$ with Power Analyses