Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases

Lynch, Colin M.; Villalobos, Rene; Valadez Mesta, Brenda Leticia; Gomez Guillen, Cesar; Mireles, Jorge; Wicker, Ryan B.

doi:10.3390/jmmp10070229

Open AccessArticle

Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases

by

Colin M. Lynch

^1,*

,

Rene Villalobos

^1,2

,

Brenda Leticia Valadez Mesta

³

,

Cesar Gomez Guillen

³,

Jorge Mireles

⁴

and

Ryan B. Wicker

^4,5,6

¹

Terra Integrated Solutions, Mesa, AZ 85202, USA

²

School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ 85281, USA

³

W.M. Keck Center for 3D Innovation, University of Texas at El Paso, El Paso, TX 79968, USA

⁴

Additive Manufacturing Education Partners, El Paso, TX 79912, USA

⁵

Department of Aerospace and Mechanical Engineering, University of Texas at El Paso, El Paso, TX 79968, USA

⁶

Trusted Metal, Evans, GA 30809, USA

^*

Author to whom correspondence should be addressed.

J. Manuf. Mater. Process. 2026, 10(7), 229; https://doi.org/10.3390/jmmp10070229

Submission received: 25 May 2026 / Revised: 25 June 2026 / Accepted: 26 June 2026 / Published: 30 June 2026

(This article belongs to the Special Issue Advances in Metal Additive Manufacturing: Processes, Materials, and Applications)

Download

Browse Figures

Versions Notes

Abstract

Additive manufacturing (AM) requalification and change-control workflows often require evidence that a candidate machine, parameter set, scanner subsystem, facility, or measurement workflow remains comparable to a stable reference process after a change, but fabrication and testing costs limit exhaustive multifeature studies. The aim of this study was to address this engineering design problem by developing a practical multifeature equivalency screening framework for AM settings in which prior engineering evidence already suggests that the candidate process should be comparable to the reference process. Building on prior work focused on the univariate problem, the proposed framework uses reference-defined percentile bins, feature-wise distributional tests, and family-wise error-rate control to screen for evidence of non-equivalency across multiple measured attributes. A direct joint-binning approach was first shown to become sample-intensive as dimensionality increases, after which an independent feature-wise method and an exploratory dependent bivariate extension were developed. Simulation-based power analyses quantified the trade-offs among power, detectable effect size, distributional resolution, feature count, and the combined costs of fabrication and measurement. In a laser-based powder bed fusion validation study with 40 observations per process and three corner-deviation features, the expected-equivalent AconityMIDI+ candidate satisfied all feature-wise equivalency criteria (

\tilde{V} = 0.207

–

0.214 < C I^{+} = 0.276

), whereas the expected non-equivalent SLM280 HL candidate failed all three feature-wise tests (

\tilde{V} = 0.357

–

1.000 > C I^{+} = 0.276

). These results support multivariate equivalency as a requalification screening tool for AM process comparability and change control, while confirming that it should not be interpreted as proof of physical-process identity or as a replacement for first-time formal qualification. Core procedures are implemented in the open-source R package MultivariateEquivalency.

Keywords:

equivalency; additive manufacturing; laser-based powder bed fusion; qualification; multivariate

1. Introduction

The evaluation of machines, processes, and facilities designed to produce additively manufactured parts often requires measurements of multiple quantities [1]. These quantities can include geometric accuracy, tensile strength, elastic modulus, porosity, and other metrics that capture different aspects of part quality [2,3,4]. Standards vary widely both within subfields of additive manufacturing (AM), such as laser-based powder bed fusion of metals (PBF-LB/M) [5,6,7,8,9,10,11], and across fields where multiple institutions have attempted to implement AM qualification standards [12,13,14,15,16]. This lack of consensus can be attributed to at least three recurring challenges: the high cost of producing and measuring samples [17], uncertainty about which properties should be measured [18], and high inter-process variability [19].

Equivalency testing provides one route for addressing these challenges in requalification and change-control settings. In this context, the objective is not to establish a new process from first principles, but to determine whether a candidate process, already supported by prior engineering evidence, shows distributional evidence of departure from a stable reference process. The univariate equivalency method introduced in [20] formalized this comparison for a single measured response by defining a reference distribution from a stable process, partitioning that distribution into percentile-based bins, and evaluating candidate observations against the resulting reference structure. This framing makes equivalency testing most appropriate as a cost-conscious screen for detecting non-equivalency rather than as proof that two physical processes are identical or as a replacement for first-time formal qualification. However, many AM requalification decisions depend on several measured attributes simultaneously, so the central problem addressed here is how to extend this distributional, reference-based logic to multifeature comparisons while accounting for the practical costs of specimen fabrication, destructive testing, and metrology.

A univariate framework is insufficient for many AM requalification and change-control decisions because part quality is inherently multidimensional. A candidate process may appear comparable with respect to one metric while differing substantially with respect to another. For example, tensile strength can be a poor indicator of underlying defects such as porosity in PBF-LB/M parts [21], even though such defects can reduce fatigue life [22]. A multifeature screening framework is therefore needed when several dimensions of part quality are relevant to the requalification claim.

The research gap is practical as well as statistical. Existing AM comparison workflows often rely on single-response tests, mean-based comparisons, large-sample material-allowables strategies, or methods that are not directly connected to the cost of fabrication and measurement. In contrast, requalification studies often operate in low-sample, high-cost regimes where the key design questions are how many specimens or build-level artifacts must be produced, how many informative features should be measured from each specimen, and which measurements can be avoided once non-equivalency is detected. These costs include not only build preparation and machine time, but also materials, inert gases, energy, operator labor, post-processing, non-destructive inspection, and destructive tests [23,24,25,26]. The high cost of AM qualification has been noted in aerospace and standards-oriented contexts, where qualification may require many physical specimens, repeated build iterations, and extensive destructive and non-destructive testing [27,28,29,30]. Dedicated test artifacts may eventually reduce this burden by consolidating multiple informative measurements into compact build layouts [31]; however, a near-term statistical workflow must also work with existing specimens, coupons, and measurement plans.

The framework also responds to the inter-process variability that is characteristic of AM. AM machines deploy many parameters that allow fine control of the build process, including parameters that may be inaccessible to users or implemented differently across machines [32,33,34]. Differences in scanner behavior, scan strategy, contour parameters, build location, powder history, post-processing, or measurement systems can produce multiple correlated or partially independent changes in part attributes [35,36,37,38,39,40,41]. The reference process must therefore be shown to be stable before equivalency testing is performed, and feature selection should consider whether measured attributes capture distinct failure modes or redundant manifestations of the same underlying process state.

The aim of this study was to address the engineering problem of multifeature AM requalification under sample-size, measurement-cost, and process-variability constraints. The prior univariate framework supplies the reference-bin decision logic for one measured response; the present study generalizes that logic only as needed to evaluate several responses within one requalification workflow. Specifically, the manuscript develops an independent feature-wise multivariate decision rule with Šidák family-wise error-rate control, evaluates the effect of feature dependence through simulation and an exploratory dependent bivariate extension, links sample-size estimation to a simplified cost model, and validates the workflow using a PBF-LB/M scanner-subsystem case study.

The manuscript proceeds as follows. Section 2 reviews the univariate method and explains why a direct joint-binning extension becomes sample-intensive. Section 3 introduces the independent feature-wise multivariate solution and its cost model. Section 4 evaluates the effect of dependence and presents an exploratory dependent bivariate extension. Section 5 validates the workflow using real AM data. Section 6 discusses interpretation, limitations, and future extensions.

2. Univariate Equivalency

2.1. Overview of Univariate Equivalency

The univariate equivalency framework can be understood as a special case of the broader multivariate workflow illustrated in Figure 1, in which only a single performance metric is evaluated. This approach was originally developed in [20] and serves as the foundation for the multivariate extensions presented here. The general idea behind univariate equivalency is that, in a requalification or change-control setting, a candidate process that is expected to be comparable can be screened against a stable reference process, perhaps one that has already been certified with more stringent statistical methods [11]. Observations of the reference process are first collected after stability is assessed with control charts [42]; these observations define a pragmatic reference distribution that is divided into equal probability b bins set by the sample percentiles. This nonparametric description of the distribution is resilient to outliers in the sense that extreme values are pooled into boundary bins, reducing their leverage; however, explicit outlier/anomaly detection remains an important secondary analysis, especially when diagnosing non-equivalent outcomes [20]. This description also captures features of the distributions that may be missed by comparing point estimates, and it does not rely on the potentially problematic assumptions of classical goodness-of-fit tests, such as fully specified parametric distributional forms, large-sample asymptotic behavior, or continuous observations without ties or sparse categories.

Once these bins are set, the experimentalist starts sequentially sampling from the candidate distribution. If the two processes are equivalent, then observations from the candidate process should be spread uniformly across the bins. If the two processes are not equivalent, then observations will cluster in a small subset of the bins. This will be apparent even in early samples, so sample sizes can be reduced by allowing termination of the experiment once there is enough evidence that the two processes are not equivalent. This procedure minimizes sample size (it can reduce sample sizes by up to

67.5 %

depending on the degree of deviance between reference and candidate distributions) while targeting conservative error-rate criteria: a false-positive rate of

α = 0.01

and a false-negative rate of

β = 0.2

, corresponding to

80 %

power. The latter is a widely used convention in power analysis [43], while the former reflects a conservative threshold appropriate for qualification-adjacent settings, where aerospace material allowables commonly use stringent reliability/confidence criteria such as 99% population coverage with 95% confidence [5,44]. The multivariate framework developed here preserves this sequential logic by assuming that a single multifeature object, such as an artifact, is printed and evaluated at each sampling step. Under this assumption, the univariate sequential stopping rules can be applied feature-wise while controlling for multiple comparisons across features. In many AM settings, however, objects are printed in batches rather than one at a time, meaning that fabrication costs may already be incurred before equivalency decisions are made. Future versions of the framework should therefore incorporate corrections for batched printing and batched interim analyses.

For the purposes of multivariate equivalency, it is sufficient to note that an equivalency test of sequential sample t is performed by first counting the number of observations from the candidate process that occur across b reference-defined bins and then calculating a Pearson-type chi-squared discrepancy between the observed count in bin k (

O_{k}

) and the expected count in bin k (

E_{k}

) [20,45]:

χ_{t}^{2} = \sum_{k = 1}^{b} \frac{{(O_{k} - E_{k})}^{2}}{E_{k}} where E_{k} = N_{t} / b

(1)

Here, t is the sequential sampling step, b is the number of reference-defined percentile bins, k is the bin index,

O_{k}

is the observed candidate-process count in bin k,

E_{k}

is the expected count in bin k under equivalency, and

N_{t}

is the cumulative candidate-process sample size at step t. These quantities are counts, indices, or dimensionless statistics and therefore have no physical units. The statistic is then converted to a normalized Cramér-type value [20,46]:

{\tilde{V}}_{t} = \frac{\sqrt{χ_{t}^{2} / N_{t}}}{\sqrt{b - 1}}

(2)

The candidate and reference processes are then compared by testing whether

{\tilde{V}}_{t}

is below the user-defined upper confidence limit,

C I^{+}

, which is calculated by finding the

χ^{2}

value that corresponds to a significance level of

α

with

b - 1

degrees of freedom:

C I^{+} = \frac{1}{\sqrt{N_{t} (b - 1)}} \sqrt{χ_{α, b - 1}^{2}}

(3)

If

{\tilde{V}}_{t} < C I^{+}

, then the candidate process is considered tentatively equivalent to the reference process with respect to the measured feature and sampling continues according to the predefined sequence. If the end of the sequence has been reached and

{\tilde{V}}_{t} < C I^{+}

at every sequential sample, then the candidate process is considered equivalent to the reference process for that feature. If

{\tilde{V}}_{t} \geq C I^{+}

at any sequential sample, then the candidate process is considered non-equivalent to the reference process for that feature and sampling terminates. This threshold is the formal decision rule used for error control; in practice, values very close to

C I^{+}

should be treated as borderline engineering cases that may justify additional sampling, sensitivity analysis, or root-cause review before a consequential process decision is made.

In this framework, “equivalency” refers to failure to detect a practically meaningful difference between the candidate distribution and the reference-defined percentile-bin distribution at the specified resolution b, significance level

α

, and power target. The null condition for each feature is that candidate observations fall uniformly across the reference-defined bins. Non-equivalency is inferred when the observed candidate bin counts deviate from this expectation beyond the decision threshold. Thus, the method should be interpreted as a reference-based distributional screening procedure rather than as proof that two underlying physical processes are identical.

2.2. Issues with Extending Univariate Equivalency

The univariate approach is promising but is not sufficient for a complete qualification claim under existing specification-, allowables-, or flight-critical certification workflows, as there are many axes by which two processes can differ in terms of the quality of the parts they produce. A naïve extension of this equivalency approach to multiple variables would require an exponential increase in sample size, making comprehensive testing infeasible in many practical scenarios. To illustrate this, consider the bivariate case where the goal is to determine whether two processes are equivalent to one another with respect to two attributes,

x_{1}

and

x_{2}

. To find the joint pragmatic distribution for the reference process, one must divide both axes into b bins and then divide the parameter space into distinct cross-bins, or cells (Figure 2A). If the two variables are independent, then observations would be uniformly distributed across the cells, just as observations are uniformly distributed across bins in the univariate case. This structure quickly leads to a combinatorial explosion in the number of cells as more variables are introduced. To perform the chi-squared-like statistical test used in the univariate case, at least five observations per cell are needed to ensure the validity of the test, so the necessary sample size would be

5 b^{2}

. However, in the simulated example shown in Figure 2A, the two variables are positively correlated (copulas were used to set the correlation value [47]), so observations tend to occur along the diagonal. This means that the off-diagonal cells can be effectively ignored and only observations in this diagonal counted (Figure 2B), which dramatically reduces the required number of samples. In this simulation, where the two random variables have a correlation coefficient of

ρ = 0.99

, 23 cells have observations, so the necessary sample size is

5 \times 23 = 115

. For

b = 8

, the sample size needed for the independent case would be

5 b^{2} = 320

, nearly three times larger. This dramatic difference highlights the potential of leveraging correlation structure to reduce experimental burden. Generally, higher correlation values and lower values of b result in lower sample sizes (Figure 2C), but not to the point where the underlying experiment is trivially feasible. The space of feasible designs remains constrained, and even small increases in dimensionality lead to substantial increases in the required number of observations. Additionally, correlations tend not to reduce sample size until they are extremely large, around

0.8

in the case where b is its recommended value of 8. The problem only worsens for higher values of p, the number of attributes. In the case where there is no correlation between variables,

n = 5 b^{p}

, whereas in the extreme case where there is perfect correlation (and therefore no reason to even measure more than one variable), or one assumes that the variables are fully redundant,

n = 5 b

. Thus, understanding the correlation structure among variables is not only informative for modeling, but essential for stating the feasibility of the proposed method. In AM data, such correlations can arise because multiple measured responses share the same thermal history, scan strategy, build location, powder history, post-processing route, or measurement system [36,37,38,39,40,41]. Mechanical properties such as yield strength, ultimate tensile strength, elongation, and reduction in area may therefore be partially redundant, whereas geometric, porosity, surface, and microstructural measurements may capture more distinct failure modes [40,41,48,49]. The expected correlation structure should therefore be treated as part of feature selection and study design rather than as an after-the-fact diagnostic.

Moreover, sequential sampling complicates any multivariate equivalency method because interim analyses are repeated across both sampling steps and measured features. Each look at the data creates an additional opportunity for a false rejection unless appropriate corrections are made. The present framework retains the sequential sampling rules developed for the univariate method and applies them under the simplifying assumption that one multifeature object, such as an artifact, is printed and evaluated at a time. This assumption allows sampling to stop as soon as any measured feature provides sufficient evidence of non-equivalence, while the feature-wise significance level is adjusted to control the family-wise error rate (the probability of making at least one false positive across a group of multiple statistical tests) across the measured attributes. However, most AM workflows print multiple parts within the same build, so the practical savings from sequential stopping may be reduced when candidate objects are produced in batches, even though sequential measurement can still reduce testing burden after fabrication. Future research should therefore extend the framework to explicitly account for batched printing and batched interim analyses, potentially using group-sequential, adaptive, or hierarchical designs to preserve error control while reflecting realistic AM production workflows.

A central implication of this construction is that the number of bins should be treated as a design parameter rather than a fixed constant. Increasing b improves distributional resolution by allowing finer distinctions among candidate and reference distributions, but it also increases the number of cells that must be characterized and therefore raises sample-size requirements. In higher dimensions, this trade-off becomes especially severe, because even moderate increases in b can produce sparse occupancy of the joint space. Accordingly, percentile binning should be viewed as both a strength and a limitation of the framework: it provides a pragmatic, nonparametric representation of the distribution, but the achievable resolution must be balanced against feasibility, statistical power, and cost.

3. Independent Multivariate Equivalency Method

The most straightforward approach to generalizing the univariate approach to a multivariate context is to treat each variable independently. In this potential framework, there are p attributes to be evaluated for each printed artifact. Each measured variable is independently tested for univariate equivalency, so each variable has a reference distribution against which the candidate distribution is compared. If just one of these comparisons shows that the candidate distribution is not equivalent to the reference distribution, then the entire process is not equivalent to the reference process. This independent rule is intentionally conservative and is best interpreted as a screening or gating criterion: it provides a practical decision rule when joint modeling is expensive or unavailable, but it does not claim equivalency of the full joint distribution.

To develop, validate, and optimize this approach, the workflow consists of the following steps:

Hotelling $T^{2}$ control charts are used to evaluate the stability of all variables simultaneously.
Multiple comparisons are controlled by adjusting the feature-wise significance level.
Simulations are used to determine the minimal sample size required for a specified power level, significance level, effect size, and number of measured features.
Under a desired power level of 0.8 and an effective significance level of 0.01, the minimal sample size is estimated for different numbers of measured features.
The resulting sample size is converted into an experiment-cost estimate that accounts for both specimen or build-level artifact production and each additional measurement. This cost conversion is reported in normalized artifact-cost units and illustrative monetary costs from one AM facility.
Printing and measurement costs are estimated across different numbers of variables and effect sizes. A combination of Pareto optimization and stability-related constraints is then used to identify cost-efficient experimental designs under specified power, resolution, and measurement-cost constraints.

Implementation note: each step in this workflow is automated in the accompanying R package, including Šidák-adjusted significance levels for multiple comparisons, Monte Carlo power analysis, and cost calculations under user-specified measurement and printing costs. Sequential sampling is available in the univariate equivalency package [20]; the present implementation applies the same sequential decision logic to the multimetric setting, with the practical interpretation that specimens may be fabricated in batches and then measured or destructively tested according to the planned sequential order. Batch-adjusted stopping rules that explicitly separate fabrication decisions from measurement decisions remain a target for future development.

3.1. Evaluating Multivariate Stability

As with the univariate case, it is essential to establish the stability of the manufacturing process before determining equivalency [20]. For univariate processes, an

\bar{x}

-chart can be used to evaluate stability. A widely adopted multivariate extension of the

\bar{x}

-chart is the Hotelling

T^{2}

control chart, which is specifically designed to monitor the stability of processes involving multiple correlated variables [50]. This chart is a multivariate statistical process control tool designed to monitor and assess the stability of a process involving two or more correlated normally distributed quality characteristics [51]. Unlike univariate control charts, which evaluate individual variables independently, the Hotelling

T^{2}

chart considers the relationships between variables, capturing their combined variability and detecting shifts that might be missed when analyzing variables in isolation.

The

T^{2}

statistic is calculated based on the Mahalanobis distance, which measures how far a given observation is from the process mean, taking into account the variance–covariance structure of the data. This ensures that the chart is sensitive to both individual deviations and collective shifts across the variables.

T^{2}

is calculated as:

T^{2} = {(x - μ)}^{⊤} Σ^{- 1} (x - μ)

(4)

where

x

is the vector of observed values,

μ

is the process mean vector, and

Σ

is the covariance matrix. To construct the control chart, plot each

T^{2}

against its run order and then compare them against the upper control limit (UCL), typically determined from an F or

χ^{2}

distribution, depending on the sample size. Observations exceeding the UCL indicate potential out-of-control conditions, signaling the need for investigation into the cause of the deviation.

Stability should be established for the reference process prior to defining reference bins, since an unstable reference cannot provide a trustworthy baseline. For candidate processes, stability evidence should ideally be established after sampling is complete (Figure 1). In practice, estimating control limits too early can be noisy, so a final stability assessment after data collection is a useful safeguard rather than a substitute for prior stability evidence. Assessing stability while sampling could introduce noise into the estimated UCL, and a process that is stable may appear unstable for a few samples. In addition to stability monitoring, Hotelling’s

T^{2}

can be interpreted as a multivariate anomaly screen: points with unusually large Mahalanobis distance merit investigation, even if the overall process is otherwise stable. Importantly, such flags are diagnostic rather than explanatory, so a flagged observation should trigger root-cause analysis (e.g., review of build logs, scan strategy changes, powder history, or metrology issues) rather than automatic exclusion.

Multivariate stability should be treated as a verified prerequisite for equivalency testing rather than as a casual assumption. Demonstrating that the reference process is in statistical control is necessary, but it is not sufficient on its own. Because equivalency decisions are made from observed data, measurement-system variation is folded into the apparent process variation unless it is independently characterized or controlled. This creates a practical ambiguity: an apparent candidate-process shift may reflect true process change, metrology noise, or both. Appendix B outlines a sensitivity framework for evaluating whether measurement-system error materially changes the estimated power, sample-size, or cost conclusions. The measurement system must also be adequately controlled, sampling conditions must be representative of the claim being made, and potential sources of temporal drift and within-build spatial heterogeneity should be controlled, randomized, or shown to be negligible relative to the intended equivalency claim. Consequently, the present framework should be applied only after both the process and the measurement system have been shown to be sufficiently stable for the reference distribution to be interpreted as a meaningful baseline.

3.2. Controlling for Multiple Comparisons

The immediate concern with the independent multivariate approach is the problem of multiple comparisons. Given the significance level

α

, there is an

α

probability of a false positive (the two processes are equivalent, but the statistical test suggests otherwise). Then, the family-wise error rate (the probability that at least one of the p tests suggests that the two are not equivalent when in reality they are) is

ϵ = 1 - {(1 - α)}^{p}

, which is greater than

α

. To ensure that

ϵ

is equal to the original significance level, the rejection level

α

can be transformed to

α^{'}

with the following function:

α^{'} = 1 - {(1 - α)}^{1 / p}

. This transformation corresponds to the Šidák correction and ensures that the family-wise error rate remains approximately equal to the target level across p tests [52,53]. Accordingly, the Šidák correction should be interpreted as controlling the probability of at least one false rejection across the measured features, not as solving the broader problem of dependence-aware equivalency of the full multivariate distribution.

This independent multivariate rule should be distinguished from a test of full joint-distribution equivalency. The independent approach controls the family-wise error rate across the set of measured features and therefore supports a multifeature decision rule, but it does not by itself establish that the full joint distribution is equivalent. In other words, two processes may appear equivalent on each measured variable individually while still differing in their dependence structure. For this reason, the independent method should be interpreted as a practical and scalable screening rule for multifeature comparability, whereas dependent multivariate equivalency remains the more direct representation of joint-distribution similarity.

3.3. Modeling Effect Size for a Single Equivalency Test for Power Analysis

The minimal sample size (n) needed to achieve a target power level was estimated for a given effect size (degree of dissimilarity between the reference and candidate processes), adjusted significance level

α^{'}

, and number of measured features p using simulation-based power analysis [54]. The analysis examined how n scales with p under the independent feature-wise methodology. Here, n is the number of parts or artifacts produced by the candidate process, and p is the number of measurements or variables taken from each sample (e.g., hardness, roughness, porosity, or mechanical properties). Under equivalency for each variable, candidate-process observations are expected to be uniformly distributed across the bins defined by the reference process.

Closed-form power and sample-size expressions are not generally available for this decision rule because the workflow combines percentile-defined reference bins, a truncated-geometric simulation alternative, sequential decision logic, multiple-comparison correction, and downstream cost optimization. Simulation therefore serves as the practical approximation tool for mapping the operating characteristics of candidate designs when exact analytical solutions do not exist or are not tractable. The fitted response surfaces also reduce computational burden by smoothing over a discrete simulation grid, allowing nearby design choices to be compared without rerunning a full Monte Carlo analysis for every possible value of n, p, and

λ

.

Under the alternative hypothesis, the distribution of candidate counts across the reference bins is asymmetrical. The dissimilarity of the candidate process to the reference process for a single feature can be modeled with a truncated geometric distribution, which gives the probability that a candidate observation will occur in bin k:

P (X = k) = \frac{{(1 - λ)}^{k - 1} λ}{1 - {(1 - λ)}^{b}} for k = 1, 2, \dots, b

(5)

where

λ

is the rate parameter and determines how biased the distribution is. If

λ \approx 0

, then the truncated geometric distribution is nearly uniform. Conversely, if

λ = 1

, then all observations occur in the first bin.

λ

, then, can be understood as the effect size of this model, normalized so that the minimal effect is set at 0 and the maximal effect is 1. This truncated geometric model is used here as a controlled simulation device for generating departures from the reference-bin uniform distribution. It is not intended to represent all possible AM process shifts, but rather to provide a monotonic and interpretable way to vary distributional deviation across simulation studies. In physical AM terms, larger values of

λ

can be interpreted as stronger systematic departures from the reference percentile-bin distribution. Such departures could arise when changes in laser power, scan speed, hatch spacing, scanner-delay behavior, build location, thermal history, powder condition, or post-processing shift responses such as porosity, surface roughness, residual stress, hardness, tensile strength, elongation, or geometric error away from the reference process [21,48,55,56]. Thus,

λ

is not a material-specific process parameter; it is a generic distributional effect-size parameter that represents how strongly candidate observations concentrate in a subset of the reference-defined bins.

The same truncated-geometric sampling scheme cannot be applied naively to all p variables without introducing dependence among the variables. If Equation (5) is used for all variables with the same value of

λ

, observations have a higher probability of co-occurring in the same cells, thereby simulating a form of discrete correlation. Consequently, the simulation workflow must control the degree of correlation among the variables.

3.4. Minimal Sample Size Needed to Evaluate p Variables

For each value of p, the sample size required to reach a power level of 0.8 was estimated (Figure 3). A power level of 0.8 is a common design target in fields such as biomedical research [57], ecology [58], social science [59], and software engineering [60], although no universal threshold appears to have been established for AM. Feasibility was therefore also evaluated at higher and lower power levels. To estimate the sample size corresponding to a given target power, a logistic regression was fit to the simulated rejection probabilities, where

P (n) = \frac{1}{1 + e^{- (β_{0} + β_{1} n)}}

(6)

After fitting the model, the sample size at which

P (n) = 0.8

is defined as

n^{*}

. This is obtained by solving for n:

n^{*} = \frac{- β_{0} - \ln (1 / P (n) - 1)}{β_{1}} = \frac{- β_{0} - \ln (1 / 0.8 - 1)}{β_{1}} = \frac{- β_{0} - \ln (0.25)}{β_{1}}

(7)

For reproducibility, all power analyses were run with a fixed random-seed workflow at the start of the simulation scripts. Unless otherwise noted, simulations used

b = 8

bins and a family-wise significance level of

α = 0.01

, with the feature-wise significance level set by the Šidák adjustment

α^{'} = 1 - {(1 - α)}^{1 / p}

. For the independent multivariate power analyses, the grid included

n \in {10, 15, \dots, 250}

,

p \in {1, 4, 7, \dots, 49}

, and

λ \in {0.10, 0.125, \dots, 0.25}

. For each parameter combination, power was estimated from 100 Monte Carlo realizations, implemented as 10 independently generated datasets with 10 replicate trials per dataset. For the correlated-data stress test, pairwise correlations were drawn from

U (0.1, 0.9)

and projected to the nearest positive-definite correlation matrix when needed. For the bivariate dependent-versus-independent comparison, the grid included

ρ \in {0, 0.05, \dots, 1}

,

n \in {10, 15, \dots, 350}

, and the same grid of

λ

values.

For each parameter combination, the estimated power curve was fit with a binomial logistic regression of rejection probability on sample size. The estimated sample size

n^{*}

was defined as the value of n for which the fitted curve reached 0.8 power. Fits were checked for convergence and for a positive slope, since power should increase with sample size under the simulated alternatives. If the fitted curve did not reach 0.8 within the tested sample-size range, or if the fitted model failed to converge or produced a non-positive slope, the corresponding

n^{*}

estimate was flagged as unresolved rather than extrapolated beyond the simulated grid. These unresolved cases were excluded from cost-surface fitting unless the grid was extended and the simulation rerun. Confidence intervals for

n^{*}

were not generated for the main figures because the purpose of the simulation workflow was design optimization rather than inference on

n^{*}

. The estimated sample-size values were propagated into the cost model to evaluate the expected nonlinear response surface of cost across the design space. In this sense, the fitted surface functions as a mean-field design approximation: it smooths the discrete Monte Carlo grid, reduces the number of parameter combinations that must be simulated directly, and enables efficient comparison of neighboring design choices. Generating uncertainty intervals at every grid point would require substantially larger nested or repeated Monte Carlo calculations across n, p,

λ

, b, and correlation structure. Such intervals may be useful for sensitivity analysis, but they are not required for identifying the expected cost-minimizing design under the specified assumptions because no hypothesis test is performed on

n^{*}

itself.

Next,

n^{*}

was estimated for each value of p, showing that

n^{*}

decreases with the number of variables measured (Figure 4A). Indeed, n follows a power law with respect to p. Here, nonlinear least squares is used to fit the function:

n^{*} = γ p^{ν} + ζ

(8)

where

ζ

is the offset (saturation point),

γ

is the amplitude, and

ν

is the scaling exponent.

3.5. Cost of Producing and Measuring Parts

Qualification and comparability studies in AM are often limited not by sample size alone, but by the combined cost of build preparation, part fabrication, post-processing, and measurement. These costs are not always incurred independently for each specimen: in many AM workflows, multiple artifacts or coupons are produced within the same batch, so fabrication costs may be constrained by build plate occupancy, setup time, machine time, and batch-level processing rather than by a purely per-part cost. At the same time, many downstream measurements are incurred on a per-feature or per-specimen basis and may remain avoidable even after a build has been completed. A deliberately simplified cost model is therefore used to compare experimental designs across different candidate sample sizes n and numbers of measured features p. The model should be interpreted as a design-oriented composite cost metric rather than a complete economic model of AM production. Costs are expressed both in normalized units, relative to the cost of producing one artifact or specimen, and in illustrative monetary units based on estimates from a single AM facility. This makes it possible to compare the relative efficiency of different multivariate equivalency designs while acknowledging that facility-specific build layouts, batching constraints, post-processing routes, and metrology workflows will change the absolute costs.

The overall cost of an experiment (C) can be decomposed into the cost of printing all of the artifacts (

C^{(a)}

) and the cost of taking all of the necessary measurements of those artifacts (

C^{(m)}

):

C = C^{(a)} + C^{(m)}

(9)

Assuming that production costs scale linearly with the number of manufactured parts or batches, the cost of printing is:

C^{(a)} = n^{*} Ω

(10)

where n is the number of printed parts and

Ω

is the monetary cost of producing a single part. The cost of measurement is:

C^{(m)} = n^{*} \sum_{j = 1}^{p} c_{j}^{(m)} for j = 1, 2, \dots, p

(11)

where

C_{j}^{(m)}

is the cost of measurement j. In practice,

C_{j}^{(m)} \neq C_{\neq j}^{(m)}

, but when comparing the relative costs of different experiments, it is sufficient to assume that each measurement is approximately equal in cost and consumes a fixed fraction of the cost of manufacturing a single part, so:

C^{(m)} \approx n^{*} p δ Ω

(12)

where

δ

is the average fractional cost of each measurement, so

0 < δ < 1

. The overall cost is therefore:

C = n^{*} Ω + n^{*} p δ Ω = n^{*} Ω (1 + p δ)

(13)

To get the unit cost of the experiment so that the final costs are expressed in terms of the number of printed artifacts (Figure 4),

Ω = 1

is set and substitute Equation (8) for

n^{*}

:

C = n^{*} (1 + p δ) = (γ p^{ν} + ζ) (1 + p δ)

(14)

The present cost model is intentionally heuristic and is meant to support relative comparison among study designs rather than to reproduce the full economics of any specific AM workflow. In practice, build-level batching, setup costs, machine amortization, technician time, post-processing, metrology throughput, and destructive-versus-nondestructive measurement choices can all alter the effective marginal cost structure. The value of the model presented here is therefore not that it captures every practical detail, but that it provides a transparent framework for reasoning about how sample size, number of measured features, and per-measurement cost interact. Process-specific economic modeling remains an important direction for future work. The model should therefore be read as a reduced mean-field cost surface rather than a full nonlinear economic model. Nonlinear build packing, setup time, machine amortization, powder reuse, operator scheduling, post-processing queues, inspection throughput, and destructive-test dependencies can be incorporated in future process-specific versions by replacing the linear artifact and measurement terms with facility-specific cost functions.

The reduction in

n^{*}

with increasing p occurs because the independent decision rule rejects process equivalency if any measured feature is non-equivalent. When each feature carries the same simulated effect size, adding features increases the number of opportunities to detect a difference. This result should therefore not be interpreted to mean that adding arbitrary measurements always improves study efficiency. In practice, redundant, noisy, weakly informative, or highly correlated features may increase cost without improving discriminatory power.

3.6. Optimal Sampling Strategy for Independent Multivariate Equivalency

Because the cost of a single measurement is typically a fraction of the cost of producing a single part, total experimental cost can sometimes be reduced by taking more measurements from fewer parts. However, increasing p has diminishing returns (Equation (8)): beyond a certain point, the cost of additional measurements can outweigh the savings from producing fewer parts, producing a minimum in total cost at intermediate values of p (Figure 4). This result depends on the selected values of

δ

and

λ

and should therefore be interpreted as design-specific rather than universal. The following analysis shows how cost-efficient designs can be selected across effect sizes.

First,

n^{*}

was calculated according to the Monte Carlo methods described in Section 3.4 across values of

λ \in {0.1, 0.125, \dots, 0.25}

. A lower limit of 0.1 was selected because smaller values produced very large sample-size requirements and may represent effect sizes with limited practical importance in many AM requalification settings. An upper limit of 0.25 was selected because departures of this magnitude were detectable at relatively small sample sizes, so designs optimized for smaller departures would also detect larger departures. The analysis then iterated over

p = 1, 4, 7, \dots, 49

, estimating

n^{*}

for each combination of

λ

and p. Equation (14) was used to estimate the total design cost (Figure 5). Given the tight correspondence between

n^{*}

and p (

R^{2} > 0.99

for most regressions), the final response surface was obtained by interpolating

n^{*}

across values of p (Figure 5). The illustrative value

δ = 0.08

was used, based on a single source of facility-specific cost estimates in which the marginal cost of one additional measurement was approximately

8 %

of the cost assigned to producing one artifact or specimen

($ 334 / $ 4165 \approx 0.08)

. This value is used only as an illustrative normalized measurement–cost ratio; in practice,

δ

will depend on the facility, measurement system, part geometry, build layout, batching constraints, and post-processing workflow.

Next, all designs satisfying

38 \leq n^{*} \leq 42

were examined. This range was selected because at least 40 samples are required to establish process stability in the reference distribution [20]; when appropriate, the same approximate sample size can also support diagnostic stability assessment of the candidate process. This filtering step produced the possible designs shown as black and green points in Figure 5. For each effect size, the minimum-cost feasible design was selected. The black line connecting the green points summarizes this cost-efficient design path across effect sizes and identifies a set of optimal trade-offs: lower-cost designs can be obtained by reducing p, whereas smaller detectable effect sizes generally require increasing p. Within the simulated design space, values of p between approximately 10 and 15 provided a practical balance because larger values increased cost while producing relatively small gains in discriminatory power. Additional details on the power-analysis simulations are given in Section 4.

4. Stress Tests for Independent Multivariate Equivalency

As the name suggests, the independent multivariate equivalency approach assumes that response variables—such as yield strength and elongation at break—are statistically independent, an assumption that is rarely, if ever, valid in practice for metal AM process–structure–property data [48,55]. In AM data, such correlations can arise because multiple measured responses share the same thermal history, scan strategy, build location, powder history, post-processing route, or measurement system [36,37,38,39,40,41]. The expected correlation structure should therefore be treated as part of feature selection and study design rather than as an after-the-fact diagnostic. This independence assumption constitutes a potential vulnerability in the method and must be explicitly evaluated prior to application. To assess the robustness and practical limitations of this approach, two stress tests were conducted: (1) applying independent multivariate equivalency to datasets in which the response variables exhibit correlation, and (2) comparing the cost-efficiency of the independent method to that of a proposed dependent equivalency framework in a simplified bivariate scenario. Correlations are first introduced into simulated measurement data, the effect of these dependencies on the independent method is then evaluated, and the dependent bivariate method is subsequently compared with the independent approach.

4.1. Setting Correlations Between Discrete Random Variables

To test the effect of correlations on the independent multivariate equivalency method, samples were drawn from a multivariate normal distribution whose correlations were manually set. The marginal distributions were then transformed to generate positively correlated truncated geometric distributions.

To generate correlated ordinal categorical data, pairwise Pearson correlations between the variables of a dataset were first specified by creating the correlation matrix

R \in R^{p \times p}

, where

R_{i j} \sim U (ℓ, u) \forall i < j

and

R_{j j} = 1

, forcing the matrix to be symmetric across the diagonal such that

R_{i j} = R_{j i}

. The matrix was then adjusted to ensure that it was a valid correlation matrix by finding the nearest positive definite matrix [61], executed with the R package Matrix [62]. For the situation where all variables are independent,

R_{i j} = 0 \forall i \neq j

while

R_{i i} = 1

.

Next, samples were drawn from a multivariate normal distribution with a mean of 0 and covariance

R

:

Z_{m} \sim N (0, R) for m = 1, \dots, n

(15)

where

Z_{m}

is a p-dimensional vector of latent variables.

The generated normal latent variables were then transformed into uniform variables on the interval

[0, 1]

by applying the cumulative distribution function (CDF) of the standard normal distribution

Φ

:

U = Φ (Z)

(16)

These uniform variables were subsequently mapped onto ordinal categories according to a truncated geometric distribution defined by a truncation parameter k and the rate parameter

λ

. Specifically, the truncated geometric cumulative distribution function was computed as:

P (X \leq k) = \frac{1 - {(1 - λ)}^{k}}{1 - {(1 - λ)}^{b}} for k = 1, 2, \dots, b

(17)

Each uniform variable

U_{i}

was categorized using:

X_{i m} = \min {k : U_{i m} \leq P (X_{i} \leq k)}

(18)

yielding the correlated ordinal observations. The number of ordinal observations in each bin was then counted to calculate the

χ^{2}

statistic (Equation (1)).

4.2. Effect of Using Correlated Data in Independent Multivariate Equivalency

Two sets of simulations were performed to test the effect of correlated data. In one set, independent multivariate equivalency was performed as described in Section 3 using the correlated-data generation procedure from Section 4.1, but with

R_{i j} = 0 \forall i \neq j

and

R_{i i} = 1

. In the second set of simulations,

R_{i j} \sim U (0.1, 0.9) \forall i \neq j

. Both simulation sets iterated over

n \in {10, 15, \dots, 250}

,

λ \in {0.1, 0.125, \dots, 0.25}

, and

p \in {1, 4, 7, \dots, 49}

. For every parameter combination, a single set of correlated values was drawn (even if the target correlation was 0), and 10 iterations were run. This was repeated for 10 separate draws for each parameter combination. This resulted in a total of

10 \times 10 \times 7 \times 17 \times 49 \times 2 = 1.1662 \times 10^{6}

simulations. The grid of candidate sample sizes was searched for the smallest value

n^{*}

at which the estimated power reached or exceeded

0.8

; if power did not reach

0.8

within the tested range, the design was treated as infeasible over that grid and no smaller interpolated value was assigned. Random seeds were controlled at the script level so that the simulated grids could be reproduced while still allowing independent draws across parameter combinations. When logistic smoothing was used to summarize power curves, fitted curves were checked for convergence and for the expected monotonic increase in power with sample size; non-convergent or non-monotone fits were resolved by using the empirical grid estimate of

n^{*}

rather than the fitted value. For each simulation condition, the unit cost was calculated for the tested experimental design that achieved a power level of

0.8

.

The results show that while the unit costs from correlated and uncorrelated data are themselves strongly correlated (Figure 6), in nearly all cases the costs from correlated data will be higher than those from the uncorrelated dataset. The only exception is when

p = 1

, when correlation has no effect on the data. The additional costs incurred from correlated data increase with p (Figure 6A) and also increase with smaller effect sizes (Figure 6B). Taken together, this highlights the need to include correlations in sample size estimates, as simulations that use independent multivariate equivalency will be underpowered if they assume an uncorrelated base dataset.

4.3. Deriving Dependent Multivariate Equivalency for the Bivariate Case

Dependent multivariate equivalency differs from independent multivariate equivalency insofar as it defines cross-bins (cells) in which observations can occur, creating a plane of potential locations that observations can appear in for the bivariate case (i.e., Figure 1B), or a hypercube in higher dimensions. Determining whether a candidate joint distribution is different from a reference joint distribution first requires setting the limits of each cell using percentiles defined by measurements made on the reference process. In cases where the attributes of the additively manufactured product are correlated, one then calculates the expected counts for each cell given the null hypothesis that there is no difference between the reference and candidate processes at the specified level of correlation. Then one counts how many observations derived from the candidate process occur in each cell. Next, one calculates

χ^{2}

based on the difference of observed and expected counts of each cell (Equation (1)). From this one derives

\tilde{V}

(Equation (2)), and then performs the equivalency test (Equation (3)).

The primary problem with establishing a dependent multivariate equivalency method is that, given a high enough degree of correlation between the attributes, there will be cases where the expected counts of some cells will be 0, regardless of whether the candidate and reference processes are equivalent. Consider, for example, the bivariate case presented in Figure 2B. Here, the cells to the upper left and lower right will have an expected count of 0. This is disallowed in the calculation of the

χ^{2}

statistic, as it would require dividing by 0 (see Equation (1)). This problem is avoided by defining a new test statistic, which is a modified version of the

χ^{2}

statistic where the expected counts are at least one. This new statistic,

χ_{+ 1}^{2}

, is calculated as:

χ_{+ 1}^{2} = \sum_{i}^{b^{2}} \frac{{(O_{i} - (E_{i} + 1))}^{2}}{E_{i} + 1}

(19)

This expression can be expanded to isolate the original

χ^{2}

component:

χ_{+ 1}^{2} = \sum_{i}^{b^{2}} \frac{{(O_{i} - E_{i})}^{2}}{E_{i}} - \sum_{i}^{b^{2}} \frac{O_{i}^{2}}{E_{i}} + \sum_{i}^{b^{2}} \frac{O_{i}^{2}}{E_{i} + 1} + b^{2}

(20)

showing that this new statistic is related to the original

χ^{2}

statistic, just with an added correction factor for the added observations to the expected values. Further simplification gives the compact expression:

χ_{+ 1}^{2} = \sum_{i}^{b^{2}} \frac{O_{i}^{2}}{E_{i} + 1} + b^{2} - N

(21)

This statistic is a linear transformation of the original

χ^{2}

statistic. Linear transformations of statistics that follow

χ^{2}

distributions (where constants are added or subtracted) do not change their distribution type. The distribution of

χ_{+ 1}^{2}

is therefore also approximately a

χ^{2}

distribution. Figure 7 shows this relationship visually by comparing simulated null distributions of the standard statistic and the modified statistic against the corresponding theoretical density. The close overlap between the empirical histograms and the theoretical curve provides a graphical check that the modified statistic behaves similarly to the standard

χ^{2}

statistic under the null condition.

To further validate this claim, the analysis tested whether the modified

χ^{2}

distributions differed significantly from the standard

χ^{2}

distribution. This was done by conducting a two-sample Kolmogorov–Smirnov (KS) test across a range of degrees of freedom. For degrees of freedom d from the set {2, 3, 4, 5, 10, 20, 50, 100}, the analysis generated 1000 samples from the standard chi-squared distribution

χ^{2}

with d degrees of freedom and the modified

χ_{+ 1}^{2}

distribution. Observations were drawn from a multinomial distribution where every bin had an equal probability of having an observation and the expected count was 1 divided by the number of bins times 1000.

The KS test was used to compare the empirical cumulative distribution functions (ECDFs) of the two distributions [63]. Since multiple comparisons were performed, the resulting p-values were adjusted using the Holm–Bonferroni method to control the family-wise error rate. Table 1 reports the KS statistics, unadjusted p-values, and Holm–Bonferroni-adjusted p-values. In all cases, the adjusted p-values exceeded the significance threshold (

α = 0.05

), indicating no statistically significant difference between the distributions.

Given these results,

χ_{+ 1}^{2}

was treated analogously to

χ^{2}

for the exploratory dependent bivariate equivalency test. Equation (1) was therefore replaced with Equation (21), and equivalency testing then proceeded as described at the beginning of this section. Dependent multivariate equivalency is conceptually closer to a comparison of full joint distributions, whereas the independent method is best interpreted as a scalable multifeature screening rule with family-wise error control. Because the adjustment changes the expected-count structure, this dependent extension is treated as exploratory and used primarily for comparison with the independent method rather than as a fully established inferential procedure.

4.4. Comparing Independent and Dependent Multivariate Equivalency for the Bivariate Case

Two sets of simulations were performed to compare the independent and dependent multivariate equivalency methods. In one set, independent multivariate equivalency was performed as described in Section 3 using the correlated-data generation procedure from Section 4.1 with

R_{i j} = ρ \forall i \neq j

. The analysis iterated over

ρ \in {0, 0.05, \dots, 1}

. The same parameter grid was then evaluated with dependent multivariate equivalency, as described in Section 4.3. Both simulation sets iterated over

n \in {10, 15, \dots, 350}

and

λ \in {0.1, 0.125, \dots, 0.25}

; only the bivariate case was considered, so

p = 2

. For every parameter combination, a single set of correlated values was drawn and 20 iterations were run. This resulted in a total of

20 \times 69 \times 7 \times 21 \times 2 = 4.0572 \times 10^{5}

simulations. As above,

n^{*}

was defined as the smallest tested sample size for which estimated power reached or exceeded

0.8

, and designs that failed to reach the target within the tested range were treated as infeasible over the simulated grid. For each simulation condition, the unit cost was calculated for the tested experimental design that achieved a power level of

0.8

.

The results show that different sets of conditions determine whether one equivalency strategy is more or less expensive than the other. Higher levels of correlation generally make dependent multivariate equivalency more expensive (Figure 8A), while smaller effect sizes also make it more expensive (Figure 8B). Generally, the independent method was less expensive across a wider range of conditions, particularly in the larger-effect-size regime that is most relevant when differences are practically detectable. It also has a more consistent cost profile; costs do not vary with

ρ

for the independent method as much as they do for the dependent method. These results support use of the independent multivariate equivalency method as a scalable first-line screen, while further research should evaluate how both methods scale with larger values of p.

In applied studies, users should estimate the empirical correlation structure among candidate features during reference-process characterization and include that structure in simulation-based sample-size calculations whenever possible.

4.5. Outlier Sensitivity as a Diagnostic Use Case

Although percentile binning reduces the influence of extreme values, outliers can still alter the observed bin counts and thereby affect equivalency decisions, particularly under the independent multivariate rule where a single feature can drive rejection. For this reason, the recommended approach is that equivalency testing be paired with a standardized anomaly workflow: after an equivalency decision is reached, inspect per-feature tail occupancy (counts in boundary bins), compute univariate robust z-scores (e.g., MAD-based), and compute multivariate distances (e.g., Mahalanobis distance) to identify whether a small number of observations disproportionately influenced the outcome. This workflow is especially useful when non-equivalency is detected, since it helps identify whether the difference is diffuse (systematic distribution shift) or sparse (a small number of extreme anomalies).

5. Validation Case Study

An experiment was performed to test the efficacy of the multivariate equivalency method using the same validation logic used for the univariate equivalency method [20]. In this design, a single process was used to create both a reference distribution and an expected-equivalent candidate distribution, while a second process was used to create an expected non-equivalent candidate distribution. If the equivalency method performs as intended, the reference distribution and the first candidate distribution should be classified as equivalent, whereas the reference and the second candidate distribution should be classified as non-equivalent. The case study first describes the experimental procedure and then follows the workflow highlighted in Figure 1 to demonstrate package use and validation results. All analysis was performed using the package functions for reference bin construction [64], per-feature equivalency testing with multiple-comparison control, and multivariate stability assessment via Hotelling’s

T^{2}

. The purpose of the case study is to validate the statistical workflow on real AM data, not to claim that the specific artifact, feature set, or machine pair used here is universally representative of all AM qualification contexts.

Plate scans were selected for validation not only because they allow for efficient sampling of process outputs, but also because they exhibit relatively low measurement uncertainty compared to many bulk or mechanically tested properties. This is particularly important for equivalency testing, where the goal is to detect differences between distributions: excessive measurement noise can obscure true process differences or inflate apparent variability. By using plate scans, plate scans reduce the contribution of measurement error to the observed distributions, allowing the equivalency analysis to more directly reflect differences in the underlying manufacturing process.

This validation study can be interpreted as a case study in scanner subsystem qualification. In laser-based powder bed fusion systems, the scanner plays a central role in determining energy delivery, local thermal history, melt pool behavior, and ultimately part quality. Scanner-delay parameters, including polygon delay, can influence laser directional changes, local thermal history, and microstructural outcomes in PBF-LB/M parts [56]. As such, the scanner represents a natural target for subsystem-level qualification. The present analysis demonstrates how multivariate equivalency can be used to assess whether a candidate scanner configuration produces outputs consistent with a reference configuration.

5.1. Experimental Methods

Plate scans were selected for the validation of the multivariate equivalency method proposed in this work. Unlike three-dimensional parts, plate scans consist of single-layer builds in which individual vectors or melt pools can be isolated for layer-wise analysis. This approach also reduces the time required to obtain and measure samples while maintaining a controlled experimental framework for comparing populations. In this study, 2 LPBF systems, the AconityMIDI+ and the SLM280 HL, were used to produce the populations required for the equivalency analysis. The AconityMIDI+ system (Aconity3D GmbH, Herzogenrath, Germany) features a continuous-wave Yb fiber laser (CFL-500, nLIGHT, Vancouver, WA, USA) with a maximum power of 500 watts and is controlled by a scanner head co-developed by Aconity3D and Raylase GmbH (Weßling, Germany). This system was selected to produce the reference population because it provides extensive configurability across multiple process and scanner parameters. The same machine was also used to produce a candidate population that was expected to be identified as equivalent by the method proposed here. For the non-equivalent candidate population, samples were produced using the SLM280 HL system (Nikon SLM Solutions AG, Lübeck, Germany). This machine is equipped with 2 continuous-wave Yb fiber lasers with a maximum power of 400 watts each and an SCANLAB intelliSCAN III 20 scanner controller (SCANLAB GmbH, Puchheim, Germany). In contrast to the AconityMIDI+, the SLM280 HL has more limited availability for user configuration of process and scanner parameters. Consequently, the SLM280 HL population was expected to differ from the AconityMIDI+ reference because the two systems differ in scanner hardware, process parameter accessibility, polygon-delay settings, machine control architecture, and the spatial relationship between the laser/scanner system and the scan plate.

To maintain a standardized process and reduce the potential influence of external confounding variables, all tests were produced under the same general conditions. Each plate was positioned at the center of the build plate of the corresponding system and maintained the same orientation relative to the gas flow direction. Because the machines differ in scanner configuration and laser position relative to the scan plate, nominally similar build plate positions do not necessarily imply identical laser-relative angles or scanner dynamics across systems. The surface of each plate was aligned to the focal plane along the z axis. All plate scans were also produced under manufacturing conditions, with the build chamber oxygen concentration maintained below 1000 ppm.

Following the procedure described in this publication, the reference population was produced by consecutively printing 20 plates with 2 samples per plate. Each sample was manually positioned at the center of the build platform before activating the laser to continue with the next subgroup. For the candidate populations, the samples were printed according to the sequential sampling strategy described in [20], which ultimately resulted in 40 samples for each measurement type. To expedite data collection for the purpose of validating the multivariate equivalency method presented in this publication, the 40 samples for both candidate populations were printed with a 20 min time interval between each sample increment. However, the data were not processed using the equivalency method during this phase. This approach was intended to simulate the scenario for tridimensional parts, where the time interval between sequential samples would be required for data processing and assessment before additional sampling. By omitting the intermediate data processing while maintaining the planned validation framework, the overall experiment duration was substantially reduced. Measurements and analysis for the candidate populations were still conducted following the sequential order defined by the equivalency method. This reflects a practical AM workflow in which samples may be fabricated together but measurements, destructive tests, and data analysis can still be staged sequentially to reduce marginal testing effort.

After printing, the samples were imaged using a Keyence VHX 7000 4K microscope (Keyence Corporation, Osaka, Japan) at 150× magnification. The geometry selected for the validation experiment consisted of a quadrilateral containing 3 angle values: 45°, 90°, and 135°. These three corner-deviation measurements are the three variables used in the multivariate equivalency example. This geometry was intentionally selected to differentiate the present work from the previous univariate equivalency study. Whereas the earlier publication evaluated the 135° corner-deviation measurement for univariate validation [20], the present work evaluates the 45°, 90°, and 135° angle measurements simultaneously as multiple variates within the same population comparison. These variables are related because they are generated by the same scanner, scan strategy, and machine configuration, but they are not necessarily redundant because each angle can respond differently to scanner dynamics and polygon-delay behavior. Therefore, the purpose of this experiment was not only to compare corner deviation between populations, but also to determine whether the combined response of multiple angle-dependent measurements could be assessed through a multivariate equivalency framework.

The Keyence VHX 7000 software (version 1.4.17.3.1) was used to measure the corner deviation of the three angles through the corner deviation measurement tool. This measurement quantified the deviation between the scanned corner and the intersection of the centerlines of the vectors forming the corner, as illustrated in Figure 9. The vector centerlines were created manually and used to generate a bisector guideline that crossed the inner and outer edges of the corner. The distance from the centerline intersection to both edges of the corner vectors was then measured along the bisector guideline, ensuring that both measurements were taken at the same angle. The corner deviation value was calculated by adding the measurements from both edges. Measurements falling within the bisector were assigned positive values, while measurements outside the bisector were assigned negative values. An ideal corner would have a corner deviation of zero, indicating that the centerline intersection is equidistant from the vector edges forming the corner. Larger positive or negative magnitudes indicate larger departures from this ideal corner geometry, with the sign indicating whether the deviation falls within or outside the bisector. Corner deviation was selected as the metric for this study because it can be influenced by polygon delay, which is a hidden parameter in some LPBF systems. The magnitude of this deviation is expected to depend on angle because scanner deceleration, acceleration, dwell behavior, and delay settings can affect acute, right, and obtuse corners differently. Equivalency was determined using the available R package [64].

This case study provides a proof-of-workflow validation rather than a comprehensive validation across all AM qualification contexts. The results show that the method behaves as expected for one LPBF plate-scan setting, one feature set, and one expected-equivalent/non-equivalent comparison. Broader validation across materials, machines, artifact geometries, destructive and nondestructive measurements, and known perturbation magnitudes will be needed before the framework can be treated as a general qualification protocol.

5.2. Descriptive and Correlation Structure of the Validation Data

Before performing equivalency testing, the empirical distributions of the three corner-deviation variables were summarized for each process. As illustrated in Figure 9, corner deviation quantifies the lateral displacement between the measured scan track centerline and the expected centerline near a programmed change in scan direction. Larger values indicate greater geometric departure from the intended toolpath, meaning that the deposited/melted track deviates farther from the nominal corner geometry. Smaller values indicate closer agreement between the measured track and the intended centerline. The magnitude of this deviation is expected to depend on the programmed angle because each angle imposes a different scanner-direction change and therefore a different combination of deceleration, acceleration, and local heat accumulation near the corner. Sharper or more abrupt changes in direction can produce different overshoot, lag, or rounding behavior than more gradual changes, so the 45°, 90°, and 135° features should be treated as related but non-identical geometric responses. The reference process had mean corner deviations of 229.62, 162.24, and 80.85 microns for the 45°, 90°, and 135° features, respectively. The expected-equivalent AconityMIDI+ candidate process produced similar values, with means of 230.43, 162.05, and 83.21 microns. In contrast, the SLM280 HL candidate differed most strongly for the 90° and 135° features, with means of 192.50 and 118.26 microns, respectively, while the mean of the 45° feature was close to the reference mean (227.13 versus 229.62 microns). This 45° result illustrates why single-variable or mean-based comparisons can be misleading: if only this feature, or only its mean, had been examined, the SLM280 HL candidate might have appeared similar to the reference even though the multivariate and distributional results show that the process is not equivalent. These raw distributions are shown in Figure 10, and descriptive statistics are reported in Table 2.

The analysis also evaluated whether the three corner-deviation variables were correlated within each process. In the reference process, the 90° and 135° features were moderately negatively correlated (

r = - 0.36

, unadjusted

p = 0.023

), although this relationship was no longer significant after Holm adjustment for the three pairwise comparisons within the reference process (

p_{adj} = 0.069

). All remaining pairwise comparisons were weak and not statistically significant, including all comparisons within the AconityMIDI+ and SLM280 HL candidate processes. Thus, nearly all pairwise comparisons were non-significant, and the only nominally significant comparison did not remain significant after within-process correction. The three corner-deviation variables were therefore treated as independent for the purposes of the independent feature-wise multivariate equivalency workflow. Physically, this result is plausible because the three corners can be affected differently by angle-specific delay behavior, scanner dynamics, and the particular machine configuration used to create the vector turns. The observed correlation structure is summarized in Table 3 and Figure 11.

5.3. Experimental Results

The reference AconityMIDI+ process was first evaluated using a Hotelling

T^{2}

control chart constructed from the 45°, 90°, and 135° corner-deviation variables. All reference observations fell below the upper control limit, indicating that the reference process was stable across the multivariate feature set used to define the reference distributions (Figure 12). Candidate-process Hotelling

T^{2}

charts were also generated as diagnostic checks. The AconityMIDI+ candidate had one observation above the upper control limit. A root-cause analysis was performed for this observation and found no evidence of a build, measurement, or data-processing issue. Because this was a single event among 40 observations and did not indicate repeated or systematic instability, the candidate process was considered to remain sufficiently in control for the validation analysis. These charts are useful for identifying observations that may warrant follow-up investigation, but the final validation decision is based on the feature-wise equivalency tests defined by the reference-process bins.

The candidate observations were then assigned to percentile bins defined by the reference process for each feature. Under equivalency, candidate observations should be approximately uniformly distributed across these reference-defined bins. The expected-equivalent AconityMIDI+ candidate showed counts distributed across all or nearly all bins for each feature, whereas the SLM280 HL candidate showed strong concentration in tail bins, especially for the 90° and 135° features (Figure 13).

Feature-wise equivalency results are reported in Table 4. For the expected-equivalent AconityMIDI+ candidate, all three feature-wise comparisons satisfied the equivalency criterion: 45°

\tilde{V} = 0.207 < C I^{+} = 0.276

, 90°

\tilde{V} = 0.214 < C I^{+} = 0.276

, and 135°

\tilde{V} = 0.207 < C I^{+} = 0.276

. Because all three features satisfied the feature-wise criterion under the Šidák-adjusted significance level

α^{'} = 0.00334

, the independent multivariate decision classified the AconityMIDI+ candidate as equivalent to the reference process.

In contrast, the SLM280 HL candidate was classified as non-equivalent for all three corner-deviation features. This result was expected physically because the SLM280 HL and AconityMIDI+ systems differ in accessible delay settings, scanner hardware, galvanometer dynamics, machine configuration, and laser/scanner geometry relative to the scan plate. The 45° feature exceeded the decision threshold (

\tilde{V} = 0.357 > C I^{+} = 0.276

), while the 90° and 135° features showed much larger deviations from the reference-defined bins (

\tilde{V} = 0.944 > C I^{+} = 0.276

and

\tilde{V} = 1.000 > C I^{+} = 0.276

, respectively). Under the independent multivariate decision rule, a candidate process is classified as non-equivalent if any feature fails the feature-wise equivalency test. The SLM280 HL candidate therefore fails the multivariate equivalency test. Together, these results show that the workflow correctly retained the expected-equivalent candidate while rejecting the expected non-equivalent candidate.

6. Discussion

Multivariate equivalency is a practical component of broader efforts to reduce the cost of evidence generation in AM requalification, process comparability, change control, and other quality-assurance contexts. Consistent with the univariate framework on which it is based [20], the method is intended for settings in which prior engineering evidence already suggests that a candidate process should be comparable to a stable reference process. The method is therefore a screening tool for detecting evidence of non-equivalency, not a stand-alone proof of physical-process identity and not a replacement for full first-time qualification under existing specification-, allowables-, or certification-based workflows.

The proposed framework should also be interpreted relative to alternative multivariate comparison strategies. Methods based on direct comparisons of continuous or joint distributions can be attractive when large datasets are available or when the primary goal is omnibus detection of any distributional difference. By contrast, the emphasis here is on a design-oriented workflow for low-sample, high-cost AM settings in which practitioners must make explicit trade-offs among detectable effect size, family-wise error control, and the monetary costs of producing and measuring artifacts. Consequently, multivariate equivalency is not presented as the unique or universally optimal approach to process comparison. It is instead a practical nonparametric option when a reference-process framework is operationally meaningful and experimental resources are constrained. Appendix A provides a structured simulation-based comparison with MANOVA-style methods, Hotelling-type tests, energy-distance and maximum-mean-discrepancy approaches, permutation-test implementations of those omnibus statistics, and reference-interval screening, with the simulation results summarized in Figure A1.

Traditional qualification approaches, such as those prescribed in the MMPDS Handbook [44], can be cost-prohibitive when applied to AM. For example, establishing A-basis allowable values for tensile strength requires printing at least 100 specimens under a parametric assumption, or up to 299 specimens when a nonparametric distribution is used. Based on a single source of facility-specific cost estimates, the cost of printing a single artifact was estimated as $4,165, and tensile testing incurs a cost of $80 per sample. Including material, energy, and labor costs, the total expenditure for this procedure ranges from $424,500 to $1,269,255 for a single measurement type. By comparison, printing 40 artifacts and evaluating 9 features yields a unit cost of approximately 68.80 artifacts in the illustrative cost model, corresponding to $286,552 under the same facility-specific estimates. This comparison is intended only as an illustrative cost scale; it does not imply that equivalency testing replaces the generation of material allowables or direct conformance to qualification standards.

The simplified cost model is useful because it exposes the design trade-off between fabrication and measurement. It is deliberately not a full economic model of AM production. In real workflows, build-level batching, setup costs, machine amortization, powder reuse, technician time, post-processing, metrology throughput, inspection queues, and destructive-versus-nondestructive measurement choices can make costs nonlinear. In the present framework, the cost model should therefore be read as a transparent mean-field design approximation that helps compare candidate designs under specified assumptions, with process-specific economic models remaining an important future extension.

The validation study illustrates the practical value of measuring multiple responses simultaneously. The SLM280 HL candidate had a 45° mean that was close to the reference mean, so a comparison based only on that mean or only on that feature could have suggested process similarity. The multivariate equivalency workflow instead detected non-equivalency, driven especially by strong shifts in the 90° and 135° features and by the distributional structure of the 45° feature. This result supports the intended use of the method as a requalification screening tool: it increases the chance that relevant process departures are detected when a single measured feature is insensitive to the underlying change.

Several limitations should be emphasized. First, the independent feature-wise method controls the family-wise probability of false rejection across measured attributes, but it does not prove equivalency of the full joint distribution. Two processes could have similar marginal distributions while differing in dependence structure. The dependent bivariate extension provides a first step toward joint-distribution equivalency, but high-dimensional dependence-aware extensions remain computationally challenging because direct joint binning can produce sparse cells and rapidly increasing sample-size requirements. Second, the effect-size parameter

λ

is a simulation parameter rather than a physical AM parameter. Larger values of

λ

represent stronger concentration of candidate observations into a subset of reference-defined bins, which can correspond to systematic process shifts in porosity, roughness, residual stress, geometry, or mechanical properties caused by process parameter or subsystem changes. Third, the current workflow makes decisions from observed data and therefore combines intrinsic process variation with measurement-system uncertainty. Appendix B outlines a measurement-error sensitivity framework for evaluating when metrology noise materially changes sample-size, power, or cost conclusions.

The open-source R package MultivariateEquivalency implements multivariate stability analysis via Hotelling’s

T^{2}

control charts, bin construction, simulation-based power analysis, cost estimation, and equivalency testing [64]. These power analyses function like traditional power analyses by allowing any two of the three variables—sample size, number of features, or effect size—to be specified while the third is estimated via simulation.

Future research should proceed in several directions. Feature selection should be integrated more directly into study design because redundant, noisy, or weakly informative variables can increase cost without improving discriminatory power. Anomaly detection should also remain a complementary diagnostic step, since AM anomalies may reflect spatter events, recoater interactions, transient process excursions, measurement artifacts, or joint multivariate outliers that are not fully resolved by percentile-bin equivalency alone [65,66,67,68,69,70]. Dependent multivariate equivalency should be extended beyond the bivariate case so that joint-distribution similarity can be assessed without an unmanageable explosion in the number of cells. Batch-aware and group-sequential extensions should distinguish build-level costs from specimen-level and feature-level measurement costs, especially in AM workflows where multiple specimens are fabricated together before interim measurements are available. Measurement-system analysis should be incorporated so that observed differences can be decomposed into process variation and metrology variation. Finally, artificial intelligence and neural-network models could complement the proposed framework by supporting feature selection from high-dimensional metrology or in situ monitoring data, learning low-dimensional embeddings before equivalency testing, identifying anomalies, or building surrogate models for cost-aware experimental design. Such models would need careful validation, uncertainty quantification, and interpretability before use in qualification-adjacent decision-making.

7. Conclusions

This study extended a previously published univariate equivalency framework to the multivariate setting while preserving its intended requalification and change-control framing. The main contribution is a nonparametric, sample size explicit screening workflow for detecting evidence of non-equivalency across multiple measured AM attributes when comparison to a known-good reference process is operationally meaningful and testing costs constrain the design.

The main conclusions are as follows:

The independent multivariate method provides a scalable first-line approach for multifeature requalification screening by combining feature-wise percentile-bin tests with Šidák family-wise error-rate control.
Direct joint-binning approaches can become sample-intensive as dimensionality increases, whereas the independent feature-wise method remains computationally and experimentally feasible for larger feature sets.
The illustrative cost model shows how multifeature measurement can reduce the number of artifacts required for a target power level, although the model should be interpreted as a mean-field design approximation rather than a complete AM economic model.
The PBF-LB/M validation study correctly retained the expected-equivalent AconityMIDI+ candidate and rejected the expected non-equivalent SLM280 HL candidate using three corner-deviation features.

Scientifically, the work contributes a multifeature, distributional, power-aware framework for AM process comparability that connects statistical decision rules to experimental design and cost estimation. From an industrial and societal perspective, reducing unnecessary fabrication and testing during requalification can lower material use, machine time, destructive testing burden, and the cost of generating engineering evidence for AM process changes. The method should nevertheless be used as one statistical component of a broader engineering evidence package, with dependence-based extensions, measurement-system analysis, anomaly diagnostics, and process-specific economic models treated as important future developments.

Author Contributions

C.M.L.: Conceptualization, Methodology, Software, Data curation, Writing—original draft, Visualization. R.V.: Conceptualization, Methodology, Supervision, Validation, Writing—review and editing. B.L.V.M.: Writing—review and editing, Experimental Design, Methodology. C.G.G.: Software, Formal Analysis. J.M.: Writing—review and editing, Experimental Design, Methodology. R.B.W.: Investigation, Conceptualization, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The research described here was performed at The University of Texas at El Paso (UTEP) within the W.M. Keck Center for 3D Innovation (Keck Center). This material is based on research sponsored by Air Force Research Laboratory under Agreement Number FA8650-20-2-5700. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Additional support was provided by strategic investments via discretionary UTEP Keck Center funds and the Mr. and Mrs. MacIntosh Murchison Chair I in Engineering Endowment at UTEP. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory or the U.S. Government.

Data Availability Statement

All data supporting the findings of this study are contained within this article. Code and package resources are available at https://github.com/colinmichaellynch/MultivariateEquivalency (accessed on 25 June 2026).

Acknowledgments

The authors acknowledge the use of generative artificial intelligence for language editing and code development. During the preparation of this manuscript, the authors used ChatGPT (OpenAI, GPT-5.2, 2026) to assist with editing and refining text and for generating and debugging functions used in the computational analyses. The authors have reviewed and edited all AI-generated output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Colin Lynch was employed as a contractor by Terra Integrated Solutions. Author J. Rene Villalobos is the founder of Terra Integrated Solutions. Author Brenda Valadez was employed by Tailored Alloys. Author Jorge Mireles was employed by Additive Manufacturing Education Partners. Author Ryan Wicker was employed by Additive Manufacturing Education Partners and is the co-founder of Trusted Metal. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AM	Additive Manufacturing
PBF-LB	Powder Bed Fusion–Laser Beam
MMPDS	Metallic Materials Properties Development and Standardization
NDE	Non-Destructive Evaluation
SPC	Statistical Process Control
DOE	Design of Experiments
KS	Kolmogorov–Smirnov
PDF	Probability Density Function
CDF	Cumulative Distribution Function
CI	Confidence Interval
ANOVA	Analysis of Variance
IQ	Installation Qualification
OQ	Operational Qualification
PQ	Performance Qualification
ISO	International Organization for Standardization
ASTM	ASTM International
NASA	National Aeronautics and Space Administration
SAE	SAE International
GenAI	Generative Artificial Intelligence

Appendix A. Comparison with Alternative Multivariate Comparison Methods

Several established multivariate comparison methods could be considered for AM process comparison, but they answer different statistical questions and impose different assumptions. The proposed equivalency framework was designed for requalification and change-control settings in which a stable reference process exists, sample sizes are limited, and the practitioner needs a cost-aware design rule for detecting evidence of non-equivalency. Table A1 summarizes how the framework differs from the alternative method classes that were included in the Appendix A simulations.

Table A1. Conceptual comparison between the proposed multivariate equivalency framework and the alternative multivariate comparison methods included in the Appendix A simulations. The table emphasizes the primary question answered by each method class, the setting in which the method is most useful, and the limitation most relevant to the requalification-screening problem addressed in this manuscript.

Method Class	Primary Question	Useful When	Limitation Relative to the Proposed Framework
Reference-bin multivariate equivalency	Does the candidate distribute across reference-defined percentile bins as expected for each measured feature, with calibrated family-wise behavior?	The decision is a cost-conscious requalification or change-control screen against a stable reference process.	Does not prove full joint-distribution identity; the independent implementation remains feature-wise.
MANOVA/Hotelling- style tests	Do the candidate and reference have different mean vectors?	Mean-vector shifts are the primary concern, and assumptions are acceptable.	Can miss variance, tail, or shape changes that do not strongly affect the mean vector.
Energy distance/MMD	Do the candidate and reference appear to come from different multivariate distributions?	A broad two-sample distributional difference is the primary question.	Often detects generic differences rather than directly supporting a reference-bin, cost-aware requalification rule.
Permutation tests	Is the observed group difference large relative to label-randomized datasets for a chosen test statistic?	A valid resampling reference distribution can be constructed for a selected statistic.	The result depends on the chosen statistic and may not directly connect to sample-size/cost design.
Tolerance or reference intervals	Do candidate observations remain inside an acceptable reference region or interval?	Engineering limits or population coverage requirements are already defined.	May not compare the full distributional structure and can depend strongly on interval construction.

To complement the conceptual comparison, a simulation study was conducted to compare the operating characteristics of several method classes under controlled departures from an equivalent reference process. The simulation was intentionally stylized because the goal was not to reproduce the full AM validation case study, but to clarify which kinds of distributional departures each method is most sensitive to. In the baseline setting, the reference sample contained

n_{R} = 300

observations, the candidate sample contained

n_{C} = 40

observations, the number of measured features was

p = 3

, the common feature correlation was

ρ = 0.50

, the reference-bin resolution was

b = 8

, and the target family-wise false detection rate was

α = 0.01

. The simulation used 150 Monte Carlo iterations for each departure setting, 499 permutations for the energy-distance and MMD tests, and 2000 null simulations to calibrate the reference-bin threshold.

Reference and candidate observations were generated from a multivariate normal distribution with exchangeable correlation. Under the no-shift condition, the candidate and reference samples were drawn from the same distribution. Three departure types were then imposed on the first feature of the candidate process: (i) a mean shift, implemented as an additive standardized displacement; (ii) a variance shift, implemented as a multiplicative increase in the feature standard deviation; and (iii) a tail/shape shift, implemented as symmetric contamination in which a proportion of observations was displaced into the tails. For each simulated comparison, all methods returned a binary decision indicating whether a difference or non-equivalency signal was detected.

The reference-bin method used the same logic as the main manuscript but added a finite-reference calibration step. Candidate observations were assigned to empirical percentile bins defined by the finite reference sample, a Cramér-type statistic was computed for each feature, and the maximum feature-wise statistic was used as the multivariate screening statistic. Because empirical reference-bin boundaries contain sampling uncertainty, the rejection threshold was not taken directly from the analytical chi-squared approximation. Instead, equivalent reference/candidate pairs were repeatedly simulated under the no-shift condition, and the 99th percentile of the resulting null distribution of the maximum feature-wise statistic was used as the calibrated decision threshold. This calibration makes the Appendix A comparison more appropriate for finite-reference settings, where the reference distribution is estimated rather than known exactly.

The false detection results in Figure A1A show that the calibrated reference-bin method operated close to the nominal

α = 0.01

target under the no-shift condition. This is important because an uncalibrated reference-bin implementation can be anti-conservative when empirical reference bins are treated as if they were exact population percentile boundaries. The finite-reference calibration step addresses this issue by incorporating uncertainty from the estimated bin cutpoints, candidate sampling variation, and the maximum-across-features decision rule. The remaining small deviations from the nominal line should be interpreted in light of the finite number of Monte Carlo iterations used for this illustrative appendix simulation.

The operating-characteristic results show that no method dominated across all departure types. Mean-vector methods, especially the MANOVA/Wilks and Hotelling-style tests, were most sensitive to mean shifts. This is expected because those tests are designed to detect differences in multivariate location. However, their sensitivity was much lower for variance and tail/shape departures, illustrating why a mean-vector test alone may be insufficient when AM process changes can alter variability, tails, or distributional occupancy without producing a large mean shift.

Figure A1. Method-comparison simulation for Appendix A. Panel (A) shows the false detection rate under the no-shift condition; the dashed vertical line marks the target family-wise rate of

α = 0.01

. Panel (B) shows detection rates as a function of effect magnitude for mean, variance, and tail/shape departures. Panel (C) shows detection rates as a function of candidate-process sample size for moderate departures. The reference-bin equivalency method used a finite-reference calibration threshold based on equivalent reference/candidate simulations. The results are intended to show that the methods answer different statistical questions rather than to identify a universally superior procedure.

Figure A1. Method-comparison simulation for Appendix A. Panel (A) shows the false detection rate under the no-shift condition; the dashed vertical line marks the target family-wise rate of

α = 0.01

. Panel (B) shows detection rates as a function of effect magnitude for mean, variance, and tail/shape departures. Panel (C) shows detection rates as a function of candidate-process sample size for moderate departures. The reference-bin equivalency method used a finite-reference calibration threshold based on equivalent reference/candidate simulations. The results are intended to show that the methods answer different statistical questions rather than to identify a universally superior procedure.

For variance shifts, the empirical reference-interval screen produced the strongest detection rates because the simulated change directly increased the probability that candidate observations fell outside the reference-defined marginal range. The reference-bin method also increased in detection rate as the variance departure grew, but it was less aggressive than the reference-interval screen. This pattern reflects the different decision logic of the two methods: the interval screen is sensitive to boundary exceedances, whereas the reference-bin method evaluates whether candidate observations are distributed across the full reference-bin structure as expected.

For tail/shape shifts, MMD and the reference-interval screen showed high sensitivity, while energy distance and the reference-bin method showed moderate sensitivity and the mean-vector methods remained comparatively weak. This result is consistent with the interpretation that omnibus two-sample tests and interval-based screens can be useful diagnostics when the dominant concern is a generic distributional or tail departure. However, those methods do not automatically provide the same reference-bin, feature-wise, sample-size, and cost-design interpretation used by the proposed equivalency workflow.

The sample-size results in Figure A1C further reinforce that method performance depends on the departure being targeted. Increasing candidate-process sample size improved detection rates for most methods, but the rate of improvement differed by method class and departure type. Mean-based methods approached or exceeded high detection rates for mean shifts, MMD improved substantially for tail/shape shifts, and the reference-interval screen remained particularly sensitive to variance and tail/shape departures. The reference-bin method improved with sample size but remained comparatively conservative in this illustrative setting, which is appropriate for a screening method designed to control false non-equivalency conclusions while preserving a direct connection to the reference-bin decision rule.

Overall, the Appendix A comparison supports the interpretation used in the main manuscript: the proposed multivariate equivalency framework should not be viewed as a universal replacement for all multivariate comparison methods. Instead, it answers a specific engineering question: whether a candidate process shows evidence of non-equivalency relative to a stable reference-bin distribution under a cost-conscious requalification or change-control workflow. If the scientific or regulatory claim concerns only mean-vector differences, MANOVA or Hotelling-style methods may be more direct. If the claim concerns broad two-sample distributional differences, energy-distance or MMD-style tests may be useful complements. If the claim concerns conformance to specified coverage limits, tolerance-region or reference-interval approaches may be more appropriate. The value of the proposed method is that it provides an interpretable reference-based screening rule that is compatible with feature-wise diagnostics, family-wise calibration, sample-size planning, and cost-aware experimental design.

Appendix B. Measurement-System Error Sensitivity Framework

Observed AM measurements combine intrinsic process variation and measurement-system variation. This issue is particularly relevant for AM metrology because XCT, surface, and dimensional measurements can be affected by scan settings, reconstruction choices, thresholding, repeatability, and operator or system effects [71,72,73]. Recent work on integrated maximum mean discrepancy tests and multivariate average equivalence testing also provides useful future comparison points for broader distributional and multivariate equivalence-method development [74,75]. To evaluate whether metrology noise can materially alter the conclusions of the multivariate equivalency workflow, a sensitivity analysis can be written as

Y_{i j}^{o b s} = Y_{i j}^{p r o c} + ϵ_{i j}^{m e a s},

(A1)

where

Y_{i j}^{o b s}

is the observed value for artifact i and feature j,

Y_{i j}^{p r o c}

is the latent process value, and

ϵ_{i j}^{m e a s}

is measurement error. A practical sensitivity study varies the ratio

η_{j} = \frac{σ_{j, m e a s}}{σ_{j, p r o c}},

(A2)

where

σ_{j, m e a s}

is the measurement-system standard deviation and

σ_{j, p r o c}

is intrinsic process standard deviation for feature j. For each value of

η_{j}

, simulated observed data can be binned relative to the reference distribution and passed through the same power-analysis and cost-estimation workflow used in the main manuscript.

The expected diagnostic outputs are: (i) change in estimated power at fixed n, p, and

λ

; (ii) change in

n^{*}

required to maintain 0.8 power; (iii) change in the cost-minimizing number of features; and (iv) change in false rejection or missed-detection behavior under equivalent and non-equivalent candidate scenarios. This appendix provides the structure for incorporating measurement-system analysis into the method. In practical applications, estimates of

σ_{j, m e a s}

should come from gauge repeatability and reproducibility studies, calibration studies, repeated microscope measurements, XCT metrology studies, or other process-specific measurement-system evaluations.

References

Chua, C.K.; Wong, C.H.; Yeong, W.Y. Standards, Quality Control, and Measurement Sciences in 3D Printing and Additive Manufacturing; Academic Press: Cambridge, MA, USA, 2017. [Google Scholar]
Slotwinski, J.A. Additive manufacturing: Overview and NDE challenges. AIP Conf. Proc. 2014, 1581, 1173–1177. [Google Scholar] [CrossRef]
Umaras, E.; Tsuzuki, M.S.G. Additive manufacturing–considerations on geometric accuracy and factors of influence. IFAC-PapersOnLine 2017, 50, 14940–14945. [Google Scholar]
Gordelier, T.J.; Thies, P.R.; Turner, L.; Johanning, L. Optimising the FDM additive manufacturing process to achieve maximum tensile strength: A state-of-the-art review. Rapid Prototyp. J. 2019, 25, 953–971. [Google Scholar]
National Aeronautics and Space Administration (NASA). NASA-STD-6030: Additive Manufacturing Requirements for Spaceflight Systems; NASA: Washington, DC, USA, 2021.
National Aeronautics and Space Administration (NASA). NASA-STD-6033: Additive Manufacturing Requirements for Equipment and Facility Control; NASA: Washington, DC, USA, 2021.
ISO/ASTM 52941:2020; Additive Manufacturing—System Performance and Reliability—Acceptance Tests for Laser Metal Powder-Bed Fusion Machines for Metallic Materials for Aerospace Application. ISO/ASTM: Geneva, Switzerland, 2020.
ISO/ASTM 52930:2021; Additive Manufacturing—Qualification Principles—Installation, Operation and Performance (IQ/OQ/PQ) of PBF-LB Equipment. ISO/ASTM: Geneva, Switzerland, 2021.
SAE International. AMS7032: Machine Qualification for Fusion-Based Metal Additive Manufacturing; SAE International: Warrendale, PA, USA, 2022. [Google Scholar]
SAE International. AMS7003A: Laser Powder Bed Fusion Process; SAE International: Warrendale, PA, USA, 2022. [Google Scholar]
Chen, Z.; Han, C.; Gao, M.; Kandukuri, S.Y.; Zhou, K. A review on qualification and certification for metal additive manufacturing. Virtual Phys. Prototyp. 2022, 17, 382–405. [Google Scholar]
Adam, G.A.O.; Zimmer, D. Design for additive manufacturing—Element transitions and aggregated structures. CIRP J. Manuf. Sci. Technol. 2014, 7, 20–28. [Google Scholar]
Seifi, M.; Gorelik, M.; Waller, J.; Hrabe, N.; Shamsaei, N.; Daniewicz, S.; Lewandowski, J.J. Progress towards metal additive manufacturing standardization to support qualification and certification. JOM 2017, 69, 439–455. [Google Scholar] [CrossRef]
Gibson, I.; Rosen, D.; Stucker, B.; Khorasani, M. Design for additive manufacturing. In Additive Manufacturing Technologies; Springer: Berlin/Heidelberg, Germany, 2021; pp. 555–607. [Google Scholar]
Martínez-García, A.; Monzón, M.; Paz, R. Standards for additive manufacturing technologies: Structure and impact. In Additive Manufacturing; Elsevier: Amsterdam, The Netherlands, 2021; pp. 395–408. [Google Scholar]
America Makes; ANSI Additive Manufacturing Standardization Collaborative. Standardization Roadmap for Additive Manufacturing, Version 3.0; America Makes: Youngstown, OH, USA; ANSI: Washington, DC, USA, 2023. [Google Scholar]
Mandolini, M.; Sartini, M.; Favi, C.; Germani, M. Cost sensitivity analysis for laser powder bed fusion. Proc. Des. Soc. 2022, 2, 1411–1420. [Google Scholar] [CrossRef]
Kawalkar, R.; Dubey, H.K.; Lokhande, S.P. A review for advancements in standardization for additive manufacturing. Mater. Today Proc. 2022, 50, 1983–1990. [Google Scholar] [CrossRef]
Anwar, T.U.; Merighe, P.; Kancharla, R.R.; Kombaiah, B.; Kouraytem, N. Influence of process parameter and build rate variations on defect formation in laser powder bed fusion SS316L. Materials 2025, 18, 435. [Google Scholar] [CrossRef] [PubMed]
Lynch, C.M.; Villalobos, R.; Valadez Mesta, B.L.; Guillen, C.G.; Mireles, J.; Wicker, R.B. Determining Univariate Equivalency of Additively Manufactured Parts. J. Manuf. Mater. Process. 2026, 10, 134. [Google Scholar]
Tiscareno, E.G.; Taylor, H.C.; Lynch, C.M.; Villalobos, J.R.; Wicker, R.B. Sensitivity of mechanical properties to processing defects: Is tensile testing an appropriate metric for laser beam metal powder bed fusion machine qualification? Addit. Manuf. 2024, 93, 104367. [Google Scholar] [CrossRef]
Kantzos, C.; Lao, J.; Rollett, A. Design of an interpretable convolutional neural network for stress concentration prediction in rough surfaces. Mater. Charact. 2019, 158, 109961. [Google Scholar] [CrossRef]
Tofail, S.A.M.; Koumoulos, E.P.; Bandyopadhyay, A.; Bose, S.; O’Donoghue, L.; Charitidis, C. Additive manufacturing: Scientific and technological challenges, market uptake and opportunities. Mater. Today 2018, 21, 22–37. [Google Scholar] [CrossRef]
Bosio, F.; Aversa, A.; Lorusso, M.; Marola, S.; Gianoglio, D.; Battezzati, L.; Fino, P.; Manfredi, D.; Lombardi, M. A time-saving and cost-effective method to process alloys by laser powder bed fusion. Mater. Des. 2019, 181, 107949. [Google Scholar]
Pannitz, O.; Sehrt, J.T. Transferability of process parameters in laser powder bed fusion processes for an energy and cost efficient manufacturing. Sustainability 2020, 12, 1565. [Google Scholar] [CrossRef]
Du Plessis, A.; MacDonald, E.; Waller, J.M.; Berto, F. Non-destructive testing of parts produced by laser powder bed fusion. In Fundamentals of Laser Powder Bed Fusion of Metals; Elsevier: Amsterdam, The Netherlands, 2021; pp. 277–300. [Google Scholar]
Gorelik, M. Additive Manufacturing: A Regulatory Perspective. Presentation at the National Academies. 2016. Available online: https://sites.nationalacademies.org/cs/groups/pgasite/documents/webpage/pga_172540.pdf (accessed on 27 April 2026).
Donmez, A.; Fox, J.; Kim, F.; Lane, B.; Praniewicz, M.; Tondare, V.; Weaver, J.; Witherell, P. In-Process Monitoring and Non-Destructive Evaluation for Metal Additive Manufacturing Processes; NIST Interagency/Internal Report (NIST IR) 8538; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2024. Available online: https://nvlpubs.nist.gov/nistpubs/ir/2024/NIST.IR.8538.pdf (accessed on 27 April 2026).
Glaessgen, E.H.; Gorelik, M.; Battaile, C.C.; Lamm, D.; Levine, L.E.; Millwater, H.R.; Peralta-Duran, A.; Raghavan, N.; Rollett, A.D.; Schwalbach, E.J. Computational Materials for Qualification and Certification (CM4QC) Strategy Document: Maturation of Computational Materials Methods for Aviation-Focused Qualification and Certification of Metal Additive Manufacturing; NASA/TM–20260001729; NASA Langley Research Center: Hampton, VA, USA, 2026. Available online: https://ntrs.nasa.gov/api/citations/20260001729/downloads/NASATM-20260001729.pdf (accessed on 27 April 2026).
America Makes. Qualification Requirements for Additive Manufacturing Processes—Phase I, II, III. Project 5507.001. 2021. Available online: https://www.americamakes.us/projects/5507-001-qualification-requirements-for-additive-manufacturing-processes-phase-i-ii-iii/ (accessed on 27 April 2026).
Taylor, H.C.; Garibay, E.A.; Wicker, R.B. Toward a common laser powder bed fusion qualification test artifact. Addit. Manuf. 2021, 39, 101803. [Google Scholar] [CrossRef]
Pereira, T.; Kennedy, J.V.; Potgieter, J. A comparison of traditional manufacturing vs additive manufacturing, the best method for the job. Procedia Manuf. 2019, 30, 11–18. [Google Scholar]
Chowdhury, S.; Yadaiah, N.; Prakash, C.; Ramakrishna, S.; Dixit, S.; Gupta, L.R.; Buddhi, D. Laser powder bed fusion: A state-of-the-art review of the technology, materials, properties & defects, and numerical modelling. J. Mater. Res. Technol. 2022, 20, 2109–2172. [Google Scholar]
Mesta, B.L.; Valadez, B.; Taylor, H.C.; Mireles, J.; Borup, D.; Wicker, R.B. User-enabled installation qualification method for laser-based powder bed fusion of metals (PBF-LB/M) machine scanner subsystem. Addit. Manuf. 2025, 100, 104694. [Google Scholar] [CrossRef]
El Hassanin, A.; Napolitano, F.; Trimarco, C.; Manco, E.; Scherillo, F.; Borrelli, D.; Caraviello, A.; Squillace, A.; Astarita, A. Laser-powder bed fusion of Inconel 718 alloy: Effect of the contour strategy on surface quality and sub-surface density. Key Eng. Mater. 2022, 926, 280–287. [Google Scholar]
Deshmukh, K.; Riensche, A.; Bevans, B.; Lane, R.J.; Snyder, K.; Halliday, H.; Williams, C.B.; Mirzaeifar, R.; Rao, P. Effect of processing parameters and thermal history on microstructure evolution and functional properties in laser powder bed fusion of 316L. Mater. Des. 2024, 244, 113136. [Google Scholar] [CrossRef]
Obeidi, M.A. Metal additive manufacturing by laser-powder bed fusion: Guidelines for process optimisation. Results Eng. 2022, 15, 100473. [Google Scholar] [CrossRef]
Mussatto, A.; Groarke, R.; Vijayaraghavan, R.K.; Hughes, C.; Obeidi, M.A.; Doğu, M.N.; Yalçin, M.A.; McNally, P.J.; Delauré, Y.; Brabazon, D. Assessing dependency of part properties on the printing location in laser-powder bed fusion metal additive manufacturing. Mater. Today Commun. 2022, 30, 103209. [Google Scholar] [CrossRef]
Cordova, L.; Bor, T.; de Smit, M.; Carmignato, S.; Campos, M.; Tinga, T. Effects of powder reuse on the microstructure and mechanical behaviour of Al–Mg–Sc–Zr alloy processed by laser powder bed fusion (LPBF). Addit. Manuf. 2020, 36, 101625. [Google Scholar] [CrossRef]
Ronneberg, T.; Davies, C.M.; Hooper, P.A. Revealing relationships between porosity, microstructure and mechanical properties of laser powder bed fusion 316L stainless steel through heat treatment. Mater. Des. 2020, 189, 108481. [Google Scholar] [CrossRef]
Townsend, A.; Senin, N.; Blunt, L.; Leach, R.K.; Taylor, J.S. Surface texture metrology for metal additive manufacturing: A review. Precis. Eng. 2016, 46, 34–47. [Google Scholar] [CrossRef]
Abbas, N.; Zafar, R.F.; Riaz, M.; Hussain, Z. Progressive mean control chart for monitoring process location parameter. Qual. Reliab. Eng. Int. 2013, 29, 357–367. [Google Scholar]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
Battelle Memorial Institute. Metallic Materials Properties Development and Standardization (MMPDS-16); Battelle Memorial Institute: Columbus, OH, USA, 2021. [Google Scholar]
Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 1900, 50, 157–175. [Google Scholar] [CrossRef]
Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
Kojadinovic, I.; Yan, J. Modeling multivariate distributions with continuous margins using the copula R package. J. Stat. Softw. 2010, 34, 1–20. [Google Scholar]
Zhao, L.; Song, L.; Santos Macías, J.G.; Zhu, Y.; Huang, M.; Simar, A.; Li, Z. Review on the correlation between microstructure and mechanical performance for laser powder bed fusion AlSi10Mg. Addit. Manuf. 2022, 56, 102914. [Google Scholar] [CrossRef]
Kan, W.H.; Chiu, L.N.S.; Lim, C.V.S.; Zhu, Y.; Tian, Y.; Jiang, D.; Huang, A. A critical review on the effects of process-induced porosity on the mechanical properties of alloys fabricated by laser powder bed fusion. J. Mater. Sci. 2022, 57, 9818–9865. [Google Scholar] [CrossRef]
Lowry, C.A.; Montgomery, D.C. A review of multivariate control charts. IIE Trans. 1995, 27, 800–810. [Google Scholar]
Hotelling, H. Multivariate quality control—Illustrated by the air testing of sample bombsights. In Techniques of Statistical Analysis; McGraw-Hill: Columbus, OH, USA, 1947. [Google Scholar]
Šidák, Z. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 1967, 62, 626–633. [Google Scholar]
Abdi, H. Bonferroni and Šidák corrections for multiple comparisons. In Encyclopedia of Measurement and Statistics; Sage: Thousand Oaks, CA, USA, 2007. [Google Scholar]
Lynch, C.M.; Perl, C.D. A User’s Guide for a Power Analysis by Simulation; ResearchGate: Berlin, Germany, 2024. [Google Scholar]
DebRoy, T.; Wei, H.L.; Zuback, J.S.; Mukherjee, T.; Elmer, J.W.; Milewski, J.O.; Beese, A.M.; Wilson-Heid, A.; De, A.; Zhang, W. Additive manufacturing of metallic components—Process, structure and properties. Prog. Mater. Sci. 2018, 92, 112–224. [Google Scholar] [CrossRef]
Valadez Mesta, B.L.; Thome, P.; Lam, M.C.; Tin, S.; Mireles, J.; Wicker, R.B. Impact of a Typical Scanner Delay Processing Parameter on Local Microstructure in Metallic Laser-Based Powder Bed Fusion. Addit. Manuf. Lett. 2025, 13, 100273. [Google Scholar] [CrossRef]
Bacchetti, P. Current sample size conventions: Flaws, harms, and alternatives. BMC Med. 2010, 8, 17. [Google Scholar] [CrossRef] [PubMed]
Hitt, N.P.; Smith, D.R. Threshold-dependent sample sizes for selenium assessment with stream fish tissue. Integr. Environ. Assess. Manag. 2014, 11, 143–149. [Google Scholar] [PubMed]
Peck, L.R. Section editor’s note: Using power insights to better plan experiments. Am. J. Eval. 2023, 44, 114–117. [Google Scholar]
Dybå, T.; Kampenes, V.B.; Sjøberg, D.I.K. A systematic review of statistical power in software engineering experiments. Inf. Softw. Technol. 2006, 48, 745–755. [Google Scholar] [CrossRef]
Higham, N.J. Computing the nearest correlation matrix—A problem from finance. IMA J. Numer. Anal. 2002, 22, 329–343. [Google Scholar] [CrossRef]
Bates, D.; Maechler, M.; Jagan, M. R Package, version 1.7-5;Matrix: Sparse and Dense Matrix Classes and Methods; R Foundation for Statistical Computing: Vienna, Austria, 2026. [Google Scholar] [CrossRef]
Berger, V.W.; Zhou, Y. Kolmogorov–Smirnov test: Overview. In Wiley StatsRef: Statistics Reference Online; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2014. [Google Scholar]
Lynch, C.M. R Package, version 0.0.0.9000; MultivariateEquivalency: Multivariate Equivalency and Power Analysis Using Cramér’s V and Truncated Geometric Distributions; GitHub, Inc.: San Francisco, CA, USA, 2026; Available online: https://github.com/colinmichaellynch/MultivariateEquivalency (accessed on 27 April 2026).
Scime, L.; Siddel, D.; Baird, S.; Paquit, V. Layer-wise anomaly detection and classification for powder bed additive manufacturing processes: A machine-agnostic algorithm for real-time pixel-wise semantic segmentation. Addit. Manuf. 2020, 36, 101453. [Google Scholar] [CrossRef]
Cummings, C.; Corbin, D.J.; Reutzel, E.W.; Nassar, A.R. Resilience of laser powder bed fusion additive manufacturing to programmatically induced laser power anomalies. J. Laser Appl. 2023, 35, 032005. [Google Scholar] [CrossRef]
Nkomo, B. Investigating how spatter evolves in metal additive manufacturing processes with machine learning. MATEC Web Conf. 2025, 417, 05001. [Google Scholar]
Aggarwal, C.C. An introduction to outlier analysis. In Outlier Analysis; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–34. [Google Scholar]
Filzmoser, P. Identification of multivariate outliers: A performance study. Austrian J. Stat. 2005, 34, 127–138. [Google Scholar]
Filzmoser, P.; Gregorich, M. Multivariate outlier detection in applied data analysis: Global, local, compositional and cellwise outliers. Math. Geosci. 2020, 52, 1049–1066. [Google Scholar] [CrossRef]
Khorasani, M.; Gibson, I.; Ghasemi, A.H.; Hadavi, E.; Rolfe, B. Laser subtractive and laser powder bed fusion of metals: Review of process and production features. Rapid Prototyp. J. 2023, 29, 826–852. [Google Scholar] [CrossRef]
Holgado, I.; Ortega, N.; Yagüe-Fabra, J.A.; Plaza, S.; Villarraga-Gómez, H. Metrological evaluation and classification of porosity in metal additive manufacturing using X-ray computed tomography. Mater. Des. 2025, 254, 114057. [Google Scholar] [CrossRef]
Sun, X.; Huang, L.; Xiao, B.G.; Zhang, Q.; Li, J.Q.; Ding, Y.H.; Fang, Q.H.; He, W.; Xie, H.M. X-ray computed tomography in metal additive manufacturing: A review on prevention, diagnostic, and prediction of failure. Thin-Walled Struct. 2025, 207, 112736. [Google Scholar] [CrossRef]
Ding, T.; Li, Z.; Zhang, Y. Testing the equality of distributions using integrated maximum mean discrepancy. J. Stat. Plan. Inference 2025, 236, 106246. [Google Scholar]
Boulaguiem, Y.; Insolia, L.; Victoria-Feser, M.-P.; Couturier, D.-L.; Guerrier, S. Multivariate adjustments for average equivalence testing. Stat. Med. 2025, 44, e10258. [Google Scholar] [PubMed]

Figure 1. Workflow for establishing a stable multifeature reference distribution and comparing a candidate process against the reference-defined bins. Panel (A) summarizes reference-process construction, and panel (B) summarizes candidate-process testing under the independent feature-wise multivariate implementation.

Figure 2. (A) Two correlated random variables, where each variable is divided into percentile bins illustrated by dashed grid lines. (B) Cells show whether candidate observations occur in the joint bins highlighted in (A); filled cells indicate occupied bins and empty cells indicate unoccupied bins. (C) Estimated sample sizes (n) needed to characterize joint reference distributions at different resolutions (b) and for different correlation values (

ρ

).

Figure 2. (A) Two correlated random variables, where each variable is divided into percentile bins illustrated by dashed grid lines. (B) Cells show whether candidate observations occur in the joint bins highlighted in (A); filled cells indicate occupied bins and empty cells indicate unoccupied bins. (C) Estimated sample sizes (n) needed to characterize joint reference distributions at different resolutions (b) and for different correlation values (

ρ

).

Figure 3. Relationship between power and sample size. Each point represents the results from a single simulation, and the blue line shows the logistic regression fit to these data. The horizontal dashed line gives a power level of 0.8, and the vertical dashed line gives

n^{*}

. Here,

λ = 0.2

,

p = 1

, and

n^{*} = 78

.

Figure 3. Relationship between power and sample size. Each point represents the results from a single simulation, and the blue line shows the logistic regression fit to these data. The horizontal dashed line gives a power level of 0.8, and the vertical dashed line gives

n^{*}

. Here,

λ = 0.2

,

p = 1

, and

n^{*} = 78

.

Figure 4. An example set of costs for experiments with varying numbers of variables to measure. The cost of printing all artifacts

C^{(a)}

decreases with p, as fewer artifacts are needed to achieve a power level of

0.8

when more measurements are taken of each artifact. Conversely, the cost of measurements

C^{(m)}

increases with p, although it does not increase linearly as fewer total artifacts are measured at high levels of p. The sum of these costs (C) almost always has a minimum at some intermediate value of p.

Figure 4. An example set of costs for experiments with varying numbers of variables to measure. The cost of printing all artifacts

C^{(a)}

decreases with p, as fewer artifacts are needed to achieve a power level of

0.8

when more measurements are taken of each artifact. Conversely, the cost of measurements

C^{(m)}

increases with p, although it does not increase linearly as fewer total artifacts are measured at high levels of p. The sum of these costs (C) almost always has a minimum at some intermediate value of p.

Figure 5. Cost response surface for designs that can discriminate reference-bin effect sizes

λ

at a power level of 0.8 across different numbers of measured variables p. The symbol # on the x-axis denotes the number of measured variables. Filled contours show the estimated total unit cost from the sample-size and cost model. Black points show feasible designs where the required number of printed artifacts satisfies

38 \leq n^{*} \leq 42

, and green points identify the minimum-cost feasible design for each effect size. The black line connects these green points to summarize the cost-efficient design path across effect sizes: lower-cost designs can be obtained with fewer measured variables, whereas smaller detectable effect sizes generally require additional measured variables and higher total cost.

Figure 5. Cost response surface for designs that can discriminate reference-bin effect sizes

λ

at a power level of 0.8 across different numbers of measured variables p. The symbol # on the x-axis denotes the number of measured variables. Filled contours show the estimated total unit cost from the sample-size and cost model. Black points show feasible designs where the required number of printed artifacts satisfies

38 \leq n^{*} \leq 42

, and green points identify the minimum-cost feasible design for each effect size. The black line connects these green points to summarize the cost-efficient design path across effect sizes: lower-cost designs can be obtained with fewer measured variables, whereas smaller detectable effect sizes generally require additional measured variables and higher total cost.

Figure 6. Unit cost comparison between an independent multivariate equivalency method when the data are uncorrelated (x-axes) and this same method used on correlated data (y-axes). The dashed line gives the one-to-one comparison between these two cases. The relative costs are influenced by both the number of variables p (panel (A)) and the underlying effect size (panel (B)).

Figure 7. Comparison of the standard chi-squared statistic (Panel (A)) and the modified

χ_{+ 1}^{2}

statistic used for the dependent bivariate equivalency extension (B). Simulated multinomial counts were generated under the null condition of equal bin probabilities, and the resulting empirical distributions were compared with the corresponding theoretical

χ^{2}

density. The modified statistic closely tracks the standard chi-squared distribution, supporting its use as an approximate diagnostic statistic when zero expected counts would otherwise make the standard chi-squared calculation undefined.

Figure 7. Comparison of the standard chi-squared statistic (Panel (A)) and the modified

χ_{+ 1}^{2}

statistic used for the dependent bivariate equivalency extension (B). Simulated multinomial counts were generated under the null condition of equal bin probabilities, and the resulting empirical distributions were compared with the corresponding theoretical

χ^{2}

density. The modified statistic closely tracks the standard chi-squared distribution, supporting its use as an approximate diagnostic statistic when zero expected counts would otherwise make the standard chi-squared calculation undefined.

Figure 8. Unit-cost comparison between the independent feature-wise multivariate method and the exploratory dependent bivariate method. Panel (A) compares unit costs across correlation values (

ρ

), and panel (B) compares unit costs across reference-bin effect sizes (

λ

). The dashed line gives the one-to-one comparison; deviations from this line show how correlation (

ρ

) and effect size (

λ

) influence relative cost.

Figure 8. Unit-cost comparison between the independent feature-wise multivariate method and the exploratory dependent bivariate method. Panel (A) compares unit costs across correlation values (

ρ

), and panel (B) compares unit costs across reference-bin effect sizes (

λ

). The dashed line gives the one-to-one comparison; deviations from this line show how correlation (

ρ

) and effect size (

λ

) influence relative cost.

Figure 9. Quadrilateral traces used for corner deviation measurements. (A) Full subgroup image with vector centerlines; cyan lines mark the vector centerlines and construction guides. (B) Direction of positive and negative deviations from centerlines. (C) Graphical representation of the corner deviation measurement for 45°. (D) Graphical representation of the corner deviation measurement for 135°. (E) Graphical representation of the corner deviation measurement for 90°. In panels (C–E), green dashed curves mark the vector edges, yellow arrows identify the inner and outer edges, red arrows show the corner-deviation distances, and cyan lines show the centerline/bisector construction guides.

Figure 10. Raw corner-deviation distributions for the reference process, the expected-equivalent AconityMIDI+ candidate, and the expected non-equivalent SLM280 HL candidate. Distributions are shown separately for the 45°, 90°, and 135° corner-deviation features. The AconityMIDI+ candidate closely overlaps the reference process, whereas the SLM280 HL candidate shows pronounced distributional shifts, especially for the 90° and 135° features. Corner deviations are measured in microns.

Figure 11. Pairwise Pearson correlations among the 45°, 90°, and 135° corner-deviation features within each process. Correlations were generally weak, with the main exception being a moderate negative correlation between the 90° and 135° features in the reference process.

Figure 12. Hotelling

T^{2}

control charts for the reference AconityMIDI+ process, the expected-equivalent AconityMIDI+ candidate, and the expected non-equivalent SLM280 HL candidate using the three corner-deviation features measured at 45°, 90°, and 135°. Each point represents one multifeature observation. The horizontal dashed line in each panel gives the upper control limit (UCL); triangles indicate observations within the UCL, and the filled circle marks the observation above the UCL. All reference observations fall below the control limit, supporting the use of this process as the stable reference distribution for the validation study. Candidate-process charts are included as diagnostic checks for unusual multivariate observations.

Figure 12. Hotelling

T^{2}

control charts for the reference AconityMIDI+ process, the expected-equivalent AconityMIDI+ candidate, and the expected non-equivalent SLM280 HL candidate using the three corner-deviation features measured at 45°, 90°, and 135°. Each point represents one multifeature observation. The horizontal dashed line in each panel gives the upper control limit (UCL); triangles indicate observations within the UCL, and the filled circle marks the observation above the UCL. All reference observations fall below the control limit, supporting the use of this process as the stable reference distribution for the validation study. Candidate-process charts are included as diagnostic checks for unusual multivariate observations.

Figure 13. Candidate observations assigned to reference-defined percentile bins for each corner-deviation feature. The horizontal dashed line indicates the expected count per bin under the equivalency hypothesis.The AconityMIDI+ candidate is distributed across the reference-defined bins, whereas the SLM280 HL candidate is strongly concentrated in tail bins, especially for the 90° and 135° features.

Table 1. Kolmogorov–Smirnov test results comparing

χ^{2}

and

χ_{+ 1}^{2}

distributions.

Table 1. Kolmogorov–Smirnov test results comparing

χ^{2}

and

χ_{+ 1}^{2}

distributions.

Degrees of Freedom (d)	KS Statistic (D)	Unadjusted p-Value	Holm–Bonferroni Adjusted p
2	0.18	0.078	0.548
3	0.10	0.699	1.000
4	0.07	0.967	1.000
5	0.07	0.967	1.000
10	0.08	0.906	1.000
20	0.16	0.155	0.772
50	0.17	0.111	0.667
100	0.19	0.054	0.433

Table 2. Descriptive statistics for the 45°, 90°, and 135° corner-deviation variables (in microns) by process.

Process	Feature	N	Mean	SD	Median	Min	Max
Reference	45°	40	229.62	4.59	230.07	222.79	237.06
Reference	90°	40	162.24	9.59	162.15	149.71	189.08
Reference	135°	40	80.85	8.59	80.06	64.57	102.34
Aconity Candidate	45°	40	230.43	7.36	229.88	213.40	245.04
Aconity Candidate	90°	40	162.05	11.34	161.97	141.83	186.85
Aconity Candidate	135°	40	83.21	7.93	84.42	66.61	99.68
SLM Candidate	45°	40	227.13	9.73	226.14	206.78	245.95
SLM Candidate	90°	40	192.50	10.15	192.10	168.29	214.10
SLM Candidate	135°	40	118.26	3.81	118.05	109.52	126.92

Table 3. Pairwise Pearson correlations among the three corner-deviation variables within each process. Holm-adjusted p-values are included to account for the three pairwise comparisons within each process.

Process	Feature Pair	Pearson r	p-Value	Holm-Adjusted p
Reference	45°–90°	0.17	0.307	0.614
Reference	45°–135°	0.01	0.969	0.969
Reference	90°–135°	−0.36	0.023	0.069
Aconity Candidate	45°–90°	0.21	0.187	0.561
Aconity Candidate	45°–135°	−0.11	0.494	0.988
Aconity Candidate	90°–135°	−0.09	0.595	0.988
SLM Candidate	45°–90°	−0.09	0.561	1.000
SLM Candidate	45°–135°	0.03	0.860	1.000
SLM Candidate	90°–135°	−0.02	0.899	1.000

Table 4. Feature-wise multivariate equivalency results for the validation study. For each candidate process and corner-deviation feature, candidate observations were assigned to bins defined by the corresponding reference distribution. The independent multivariate decision uses a Šidák-adjusted feature-wise significance level of

α^{'} = 0.00334

to control the family-wise error rate across the three features.

Table 4. Feature-wise multivariate equivalency results for the validation study. For each candidate process and corner-deviation feature, candidate observations were assigned to bins defined by the corresponding reference distribution. The independent multivariate decision uses a Šidák-adjusted feature-wise significance level of

α^{'} = 0.00334

to control the family-wise error rate across the three features.

Comparison	Feature	N	$\tilde{V}$	${CI}^{+}$	$α^{'}$	Decision
Aconity Candidate vs. Reference	45°	40	0.207	0.276	0.00334	Equivalent
Aconity Candidate vs. Reference	90°	40	0.214	0.276	0.00334	Equivalent
Aconity Candidate vs. Reference	135°	40	0.207	0.276	0.00334	Equivalent
SLM Candidate vs. Reference	45°	40	0.357	0.276	0.00334	Non-equivalent
SLM Candidate vs. Reference	90°	40	0.944	0.276	0.00334	Non-equivalent
SLM Candidate vs. Reference	135°	40	1.000	0.276	0.00334	Non-equivalent

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lynch, C.M.; Villalobos, R.; Valadez Mesta, B.L.; Gomez Guillen, C.; Mireles, J.; Wicker, R.B. Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases. J. Manuf. Mater. Process. 2026, 10, 229. https://doi.org/10.3390/jmmp10070229

AMA Style

Lynch CM, Villalobos R, Valadez Mesta BL, Gomez Guillen C, Mireles J, Wicker RB. Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases. Journal of Manufacturing and Materials Processing. 2026; 10(7):229. https://doi.org/10.3390/jmmp10070229

Chicago/Turabian Style

Lynch, Colin M., Rene Villalobos, Brenda Leticia Valadez Mesta, Cesar Gomez Guillen, Jorge Mireles, and Ryan B. Wicker. 2026. "Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases" Journal of Manufacturing and Materials Processing 10, no. 7: 229. https://doi.org/10.3390/jmmp10070229

APA Style

Lynch, C. M., Villalobos, R., Valadez Mesta, B. L., Gomez Guillen, C., Mireles, J., & Wicker, R. B. (2026). Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases. Journal of Manufacturing and Materials Processing, 10(7), 229. https://doi.org/10.3390/jmmp10070229

Article Menu

Practical Multivariate Equivalency Testing for Additively Manufactured Parts: Comparing Independent and Dependent Cases

Abstract

1. Introduction

2. Univariate Equivalency

2.1. Overview of Univariate Equivalency

2.2. Issues with Extending Univariate Equivalency

3. Independent Multivariate Equivalency Method

3.1. Evaluating Multivariate Stability

3.2. Controlling for Multiple Comparisons

3.3. Modeling Effect Size for a Single Equivalency Test for Power Analysis

3.4. Minimal Sample Size Needed to Evaluate p Variables

3.5. Cost of Producing and Measuring Parts

3.6. Optimal Sampling Strategy for Independent Multivariate Equivalency

4. Stress Tests for Independent Multivariate Equivalency

4.1. Setting Correlations Between Discrete Random Variables

4.2. Effect of Using Correlated Data in Independent Multivariate Equivalency

4.3. Deriving Dependent Multivariate Equivalency for the Bivariate Case

4.4. Comparing Independent and Dependent Multivariate Equivalency for the Bivariate Case

4.5. Outlier Sensitivity as a Diagnostic Use Case

5. Validation Case Study

5.1. Experimental Methods

5.2. Descriptive and Correlation Structure of the Validation Data

5.3. Experimental Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Comparison with Alternative Multivariate Comparison Methods

Appendix B. Measurement-System Error Sensitivity Framework

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI