1. Introduction
The laser powder bed fusion of Ti-6Al-4V enables complex geometries and functional surfaces, but reliable qualification and rapid recipe development are still constrained by the difficulty of predicting performance from process settings alone [
1,
2,
3]. The underlying scientific challenge is the coupled process of the structure to property pathway, where process variables and engineered energy-input descriptors act as proxies for thermal histories that shape defect populations and microstructural evolution, and these latent states mediate mechanical and surface responses [
4,
5,
6,
7,
8,
9]. A practical solution is to build recipe-level surrogate models that learn this mapping using a consistent experimental unit and statistically defensible evaluation while integrating morphology and proxy information when available and quantifying uncertainty for decision-making [
10,
11]. In this study, the learning problem is explicitly defined at the set level (set_id), and the dataset is treated as intrinsically multimodal by combining process and engineered physics features with pore, prior-β grain, and stress–strain-derived proxy blocks where available.
Research directly aligned with deployable, recipe-level, multimodal property prediction remains limited, and much of the literature addresses narrower subproblems. Early representative work focused on predicting porosity from process parameters using statistical models with Bayesian inference (for example, Tapia and Elwany, 2015) [
12]. Subsequent work developed Gaussian process formulations for porosity prediction as a function of process parameters, including spatial Gaussian process regression variants [
13]. A widely used modern pattern for efficient exploration is to couple Gaussian process regression with Bayesian optimization to guide sampling of the L-PBF process space, including demonstrations on Ti-6Al-4V that combine rapid characterization with model-guided discovery of processing domains [
14]. In parallel, the community continues to debate the reliability of scalar energy-density compressions as general design variables, and recent reviews emphasize that volumetric energy density can correlate with outcomes in restricted settings while failing as a universal predictor when different parameter combinations yield different melt-pool regimes [
15,
16]. These directions are informative, but they do not by themselves resolve the deployment gap created by incomplete morphology availability, small recipe counts, and the need for uncertainty-aware co-design across multiple targets.
The present work addresses the deployment gap by structuring the entire pipeline around recipe-level learning, leakage control, modality heterogeneity, and decision-time constraints, rather than assuming retrospective access to all measurements. First, each model is trained and evaluated under grouped cross-validation by set_id so that no set contributes to both training and test partitions within a fold, and every data-dependent transformation is fit on the training portion only and applied unchanged to the held-out portion. Second, multimodality is introduced through explicit set-aligned feature blocks, assembled by merging process and engineered physics descriptors with pooled pore, grain, and stress–strain-derived descriptor blocks, with missingness tracked as a structural property of the dataset rather than treated as an error. Third, the key methodological distinction relative to many morphology-augmented surrogates is the separation between an oracle regime and a deployable regime: morphology proxies are treated as informative but incompletely observed, so modality embeddings are constructed within training folds and then used either as oracle embeddings when present or as predicted embeddings inferred from process features when morphology is absent. This design explicitly avoids reporting oracle-only improvements as deployable gains and introduces an embedding predictability diagnostic as a necessary condition for deployable multimodal inference. Finally, the surrogate stack is integrated into a decision layer that uses fold ensembles to produce prediction means and dispersion estimates and applies conservative lower-confidence bounds for candidate screening and ranking under constraints.
The results sought are to establish a deployable surrogate workflow that is stable under small sample size and partially observed modalities, and that supports downstream design decisions without assuming unavailable measurements. Concretely, the objectives are to obtain target-wise, fold-aggregated performance under the fixed GroupKFold protocol, to quantify the oracle versus deployable gap for morphology information, and to determine which modality embeddings are predictable from process variables with non-trivial fidelity using out-of-fold diagnostics. In addition, the study aims to produce a reproducible co-design procedure that generates a finite candidate pool within an observed process envelope, computes engineered physics features consistently, propagates predicted embedding when required, and evaluates feasibility and ranking under uncertainty-aware constraints using a fixed candidate budget.
The significance is that the work provides a deployment-aligned template for recipe-level learning and uncertainty-aware co-design in L-PBF Ti-6Al-4V, where conclusions depend on strict leakage control, explicit treatment of missing modalities, and separation of scientific upper bounds from decision-time feasibility [
17,
18]. The oracle versus deployable comparison yields an interpretable bound on the value of morphology information and clarifies when additional characterization is necessary versus when morphology can be treated as a latent state recoverable from process descriptors for prospective screening [
19,
20,
21]. By incorporating uncertainty into constraint satisfaction and ranking, the framework reduces the risk of brittle recipe selection and supports multi-objective tuning for applications in which bulk mechanical requirements must coexist with surface-relevant constraints. Although developed for L-PBF Ti-6Al-4V, the present framework is transferable to other process–structure–property systems with sparse, partially observed multimodal data, provided that system-specific descriptors and target variables are reformulated and the models are retrained and prospectively validated in the new domain.
2. Materials, Data Sources, and Preprocessing
A recipe-level dataset for L-PBF Ti-6Al-4V is assembled such that each manufacturing condition is treated as a single independent set (set_id, 1–42), and all endpoints are represented at this set level using arithmetic means consistent with the database convention. Multimodal descriptors (process/engineered physics and morphology/microstructure proxies) are aligned to set_id, while non-uniform modality/label availability is treated as intrinsic and managed through explicit missing-data handling and grouped cross-validation with fold-local preprocessing to prevent leakage, as provided in
Appendix A.
All analyses were conducted in a Conda-managed Python 3.11.15 environment on Windows, using NumPy 2.4.3, pandas 3.0.1, SciPy 1.17.1, scikit-learn 1.8.0, statsmodels 0.14.6, matplotlib 3.10.8, seaborn 0.13.2, openpyxl 3.1.5, and pyarrow 23.0.1.
Figure 1 summarizes the leakage-safe data construction (
Figure 1a) and provides a schematic of the deployable surrogate and uncertainty-aware co-design loop evaluated in later sections (
Figure 1b,c).
2.1. Study Design and Set-Level Unit of Analysis (set_id)
The analytical unit is the set, indexed by set_id (range 1–42), where each set corresponds to a unique L-PBF manufacturing condition (recipe). All observations associated with the same set_id are treated as non-independent; this design prevents pseudoreplication arising from within-condition dependence when multiple specimens or repeated measurements exist for a given recipe.
All endpoints are represented at the set level. Mechanical and surface responses are collapsed to a single set-level target value using the arithmetic mean consistent with the database convention (suffix “__mean”; target aggregation defined in
Section 2.3). Target completeness differs across endpoints, with roughness exhibiting partial label availability relative to the mechanical properties, motivating an explicit missing-data policy and fold construction that preserve the set_id grouping (
Section 2.5 and
Section 2.6;
Figure 2c).
Multimodal descriptors are aligned to set_id prior to modeling. The feature space comprises (i) process inputs and engineered physics features and (ii) morphology and microstructure proxies derived from object-level pore and prior-β grain descriptors and, when available, stress–strain-derived descriptors. Within-set heterogeneity is not discarded; object-level distributions are represented through pooled moments and quantiles to retain dispersion and tail behavior in the input space (
Section 2.4;
Figure 2e,f). Leakage control is implemented procedurally by grouped cross-validation, where train/test splits are performed by set_id so that no set contributes to both training and evaluation within a fold (
Section 2.6).
2.2. L-PBF Process Parameter Space and Engineered Physics Features
The primary covariates include L-PBF process parameters defining the processing window (for example, laser power P, scan speed v, hatch spacing h, and layer thickness t, where available). To provide compact, physically motivated representations of the process space, energy-density variants are constructed in addition to the raw parameters. In particular, the line energy density (LED) is defined as
where P denotes laser power, and v denotes scan speed. When geometric terms are available, a volumetric energy-density variant consistent with standard L-PBF practice is included,
where h is hatch spacing, and t is layer thickness. These descriptors are treated as nominal energy-input proxies that support comparisons across parameterizations and complement the base process variables; their inclusion alongside constituent parameters is an expected redundancy within the locked feature specification rather than a post hoc feature-selection choice. The resulting operating-space coverage and the distributional support of key variables are reported in
Figure 2b, and the overall block structure of inputs feeding the surrogate models is summarized in
Figure 2a. The conditional availability of h and t is handled under the missingness and feature-filtering policy described in
Section 2.5; feature definitions and units are given in
Appendix A.3.
In
Figure 2, we explicitly distinguish output targets from model inputs. Among the inputs, laser power and scan speed are the primary raw controllable process variables visualized in the present dataset, whereas quantities such as
and LED are deterministic engineered descriptors derived from those variables and are included as supplementary covariates rather than as independent process controls. Pore-related quantities shown in
Figure 2 are likewise not treated as raw process parameters; they are morphology/proxy inputs derived from post hoc characterization and are analyzed separately at the modality level in
Figure 2c–f.
2.3. Target Definitions: Yield, UTS, Elongation, Modulus, Roughness, Hardness
Six set-level response variables quantify mechanical performance and surface quality: 0.2% offset yield strength (yield, MPa), ultimate tensile strength (UTS, MPa), total elongation to fracture (elongation, %), Young’s modulus (modulus, GPa), surface roughness (roughness, µm), and Vickers microhardness (hardness, HV). Each target is represented at the set level, consistent with the unit of analysis defined in
Section 2.1, such that each row corresponds to a unique build condition and its associated measurement set. When replicate measurements or multiple measurement locations are available for a given set, the reported target is the within-set arithmetic mean, aligned with the database convention (suffix “__mean”) as defined in Equation (3),
where y
s,j denotes the jth measurement for set s, and n
sis the number of measurements available for that set and target.
The empirical distributions of the six targets, including skew and outliers, are summarized in
Figure 2c. Dataset-level completeness differs by target, with roughness exhibiting partial label availability relative to the mechanical properties; this motivates explicit missing-data handling and evaluation procedures described in
Section 2.5 and
Section 2.6. Target definitions and units, together with the number of labeled sets per target, are reported in
Table 1.
2.4. Morphology and Microstructure Proxies
Morphology and microstructure information is available as object-level measurements (for example, individual pore or grain objects) that vary in count across builds and imaging fields. To enable consistent learning at the set level, all morphology modalities are reduced to fixed-length set-level descriptors by statistical pooling over the object-level distributions within each set_id. This produces a tabular representation that can be merged with the L-PBF process variables and engineered physics features while preserving dominant distributional characteristics (central tendency, spread, tails, and extremes) relevant to defect populations and microstructural heterogeneity; modality-specific feature construction and aggregation rules are detailed in
Appendix A.4 (
Figure 2a,e).
Let
denote an object-level scalar descriptor within one set, such as pore equivalent diameter, pore sphericity, pore volume, or grain aspect ratio, with nobjects detected for that set. The pooled set-level moments and quantiles are given in Equation (4).
For each primitive descriptor, the pooled feature block includes
together with derived transforms for heavy-tailed distributions (for example,
variants of strictly positive measures). The resulting morphology features are therefore interpretable distribution summaries and can be directly analyzed for coverage and missingness (
Figure 2e) and for redundancy against process and engineered features (
Figure 2d). These pooled summaries also provide the input representation used by subsequent morphology-module experiments that construct low-dimensional morphology embeddings under fold-respecting procedures (
Appendix C).
2.4.1. Pore Morphology Summary Features
Pore morphology proxies summarize the size, shape, and population structure of defect pores within each set. Object-level pore descriptors include geometric measures such as volume, equivalent diameter, surface area, and sphericity; these are pooled per set using the statistics above, yielding a fixed set-level pore feature vector. Tail-sensitive summaries (high quantiles) are retained to reflect the process relevance of extreme pores. Where pore counts are low or absent for a set, the resulting set-level pore block is treated as partially observed rather than forced to zero, and missingness is handled under the policy in
Section 2.5.
2.4.2. Prior-β Grain Morphology Summary Features
Prior-β grain morphology proxies are constructed analogously from object-level grain measurements, capturing microstructural anisotropy and characteristic length scales. Representative descriptors include grain aspect ratio and equivalent diameter (or closely related size measures, depending on the raw measurement schema). These descriptors are pooled per set using the same moment and quantile statistics, forming a grain-summary block that is compatible with the unified set-level modeling table. Because grain measurements may be present for a subset of sets only, their availability is tracked explicitly and incorporated into missingness accounting (
Figure 2e).
2.4.3. Stress–Strain-Derived Descriptors
When stress–strain curves are available, they provide an additional source of morphology-adjacent proxy descriptors that can be used when direct imaging modalities are incomplete. In this case, each curve is reduced to a compact feature representation using physically interpretable summaries derived from the curve shape, such as slope-based stiffness proxies, characteristic stress and strain landmarks, and scalar descriptors that capture hardening behavior. These descriptors are treated as a separate modality block and incorporated at the set level under the same unified feature table, with their missingness recorded and handled using the procedures in
Section 2.5 (
Figure 2e).
More modest or inconsistent gains from morphology augmentation for certain targets should not be interpreted as evidence that the prior-β grain descriptors are irrelevant. Rather, the prior-β grain block captures only one part of the latent microstructural state. Stress–strain-derived descriptors integrate the cumulative effects of prior-β morphology together with pore population, α/α′ lath morphology, α/β phase constitution, retained β, texture, residual stress, and lattice-defect/dislocation structure. Accordingly, only partial correlations between grain summaries and stress–strain descriptors are expected, and the physically relevant signal is better interpreted at the modality/latent-embedding level than at the level of simple pairwise raw-feature correlation.
2.5. Missingness, Imputation Policy, and Feature Filtering
The compiled set-level table is multimodal and partially observed because modality availability is non-uniform across sets, with certain sets lacking pore, grain, or stress–strain descriptors depending on upstream data availability. Missingness is treated as an intrinsic property of the study design rather than an error condition. Missingness is quantified at both the feature level and the modality-block level to separate sporadic missingness from structurally absent modalities (
Figure 2e). Targets are also incompletely observed for specific properties; modeling for a given target is performed only on the labeled subset for that property (
Figure 2c).
To control dimensionality and stabilize learning under small-N conditions, feature filtering is applied prior to model fitting using a coverage threshold defined as a minimum required non-null fraction across sets (default criterion: at least 0.70 non-null coverage). This coverage filtering is performed at the feature-definition stage, and the resulting retained feature list is used consistently across subsequent modeling blocks to prevent feature drift and preserve comparability (
Figure 2d,e). Constant or near-constant columns are additionally removed to avoid degenerate predictors.
For remaining missing values within retained features, imputation is performed using a fold-respecting rule in which imputation parameters are estimated on the training portion of each fold and applied to the corresponding validation or test portion. Median imputation is adopted as the default policy because it is robust under heavy-tailed distributions and small-sample settings and does not impose parametric assumptions that are rarely defensible in sparse multimodal materials datasets. Denoting the training-set median for feature j in fold k by
, missing values are imputed as
When a model requires standardized inputs, standardization is likewise fit on the training portion of each fold only. With training mean
and standard deviation
, the standardized value is
with
and
computed exclusively from the training split. This fold-conditional preprocessing prevents information leakage by disallowing test-fold statistics from influencing training-fold preprocessing; details are provided in
Appendix A.5.
2.6. Train/Test Protocol: GroupKFold by set_id and Leakage Controls
Model evaluation follows a 5-fold grouped cross-validation protocol in which all rows sharing the same set_id are assigned to the same fold. This prevents optimistic bias caused by within-set correlations by ensuring that no set contributes to both training and test partitions within a fold. Let
denote the set of unique set_id groups. For fold
, the training and test group sets
and
satisfy Equation (7).
The normalized split file spans all 42 sets, with fold sizes of train/test = 33/9 for folds 1–2 and 34/8 for folds 3–5 (
Figure 2f).
Leakage controls are enforced by performing every data-dependent operation strictly within each training fold and applying the learned transformation to the corresponding test fold; this includes imputation parameters, standardization parameters for scale-sensitive models, and any learned transformations used within the modeling pipeline. Hyperparameters are tuned via discrete grids using training-fold internal validation, and the selected best setting is recorded per fold and target.
For targets with partial label availability, evaluation is performed on the labeled subset within each fold’s held-out partition. For roughness, 28 of 42 sets are labeled; folds contribute to the reported metrics through the labeled test instances only, avoiding artificial inflation of performance from label handling. Performance is reported as the mean and standard deviation across folds for each target using consistent metrics to enable direct comparison across models and modality variants (
Figure 3a).
3. Surrogate Modeling Framework
Surrogate models are formulated to predict each set-level target using the leakage-resistant 5-fold GroupKFold protocol grouped by set_id (
Section 2.6), with fold-local preprocessing for operations that depend on data statistics (
Section 2.5); details are provided in
Appendix B.1.
The modeling workflow proceeds from (i) tabular-only baselines using L-PBF process descriptors and engineered physics features to (ii) hybrid multimodal surrogates that append set-aligned morphology and proxy descriptor blocks (pores, prior-β grains, and stress–strain descriptors) under a consistent set-level unit of analysis. The locked modeling, preprocessing, and decision-layer settings adopted throughout the study are summarized in
Table 2.
3.1. Baseline Tabular Models and CPU-Only Training Setup
Surrogate models are trained to predict each set-level target from the L-PBF process descriptors and engineered physics features defined in
Section 2.2, evaluated under the GroupKFold protocol in
Section 2.6. A suite of tabular regressors spanning linear, kernel, bagging, and boosted-tree families establishes a CPU-feasible reference for each property: ridge regression, k-nearest neighbors’ regression, RBF-kernel support vector regression, random forest regression, extremely randomized trees, histogram-based gradient boosting, and gradient-boosted decision trees; details are provided in
Appendix B.2.
All experiments are executed under a CPU-only workflow on an Intel(R) Xeon(R) Silver 4214 CPU @ 2.20 GHz (no GPU acceleration). Models are refit from scratch within each fold. Targets are modeled independently (one regressor per property) due to differences in label availability and noise structure and to allow for target-specific model selection under the same evaluation protocol.
Preprocessing steps that use fold-dependent statistics (imputation and any scaling) are fit on the training portion only and applied unchanged to the held-out portion. Coverage-based feature filtering is applied at the feature-definition stage using the full set-level table to remove sparse or degenerate columns, while fold-wise imputation and scaling remain strictly confined to training-only statistics (
Section 2.5). Baseline performance is reported as mean and standard deviation across folds; the best-performing baseline per target is identified under this fixed protocol and carried forward as the reference for later modality deltas (
Table 3).
3.2. Hybrid Multimodal Feature Construction
Hybrid multimodal surrogates extend the tabular baseline by concatenating set-level morphology and proxy descriptors derived from additional modalities (
Section 2.4). The unified design matrix is assembled by aligning and merging four set-level blocks by set_id: (i) tabular process and engineered physics descriptors, (ii) pore morphology summary features, (iii) prior-β grain morphology summary features, and (iv) stress–strain-derived descriptors (when available) retained as a proxy block; details are provided in
Appendix A.6. Modality coverage and missingness are tracked explicitly (
Figure 2e).
Because modality blocks differ in dimensionality and sparsity, hybrid construction applies controlled feature screening and fold-safe imputation. Candidate columns are screened by non-null coverage, and degenerate columns are removed; the remaining missing entries are imputed using fold-specific training medians (
Section 2.5). Hybrid variants are constructed as modality ablations to isolate incremental contributions, including tabular-only, tabular + pores, tabular + grains, tabular + pores + grains, tabular + stress, and tabular + all-morphology. These variants are trained and evaluated under the same GroupKFold protocol as the tabular baselines, enabling direct comparisons under identical leakage controls and CPU constraints (
Table 3;
Figure 3a,b).
Two integration settings are supported within the same framework. In the direct-fusion setting, pooled morphology statistics are appended as explicit numeric features to enable cross-modal interactions in downstream learners. In the deployable two-stage setting, modality embeddings are learned within each training fold and then used as oracle embeddings when present or predicted from the tabular block when absent, separating the representational value of morphology from deployment-time availability constraints.
3.3. Hyperparameter Selection and Evaluation Metrics
Hyperparameter selection followed a leakage-controlled workflow consistent with the 5-fold GroupKFold protocol by set_id, such that candidate configurations were trained using training data only within each fold and the held-out fold was accessed only for final scoring.
All data-dependent operations that can transmit distributional information (including imputation, standardization, and any coverage-based filtering rules) were confined to training-fold computations and then applied unchanged to the corresponding held-out fold to preserve the leakage barrier.
Hyperparameters were tuned via discrete grids using training-fold internal validation, with the selected setting recorded per fold and per target to maintain auditability under small-N conditions; full tuning grids and statistical reporting details are given in
Appendix B.2 and
Appendix B.4.
Table 2 formalizes the invariant methodological assumptions adopted throughout the study, including the recipe-level unit of analysis (set_id), the grouped cross-validation design, and strictly fold-local preprocessing operations that preclude information leakage. By locking these protocol elements across all model families, the empirical comparisons reported in subsequent sections isolate the effect of the surrogate and uncertainty modules from confounding variation in data handling. In addition, the morphology and co-design components are treated as version-controlled, archived configurations, ensuring that both predictive evaluations and downstream design recommendations are reproducible under an identical experimental specification. For boosted-tree libraries, early stopping was used only when supported by the installed package API; otherwise, a fixed estimator budget was used and model selection proceeded through the declared discrete grid to preserve version-robust reproducibility. Evaluation was reported using complementary error and goodness-of-fit metrics to capture both absolute deviation and explained variance across targets with different scales. Root mean squared error (RMSE) was used as the primary scale-dependent metric due to its sensitivity to large deviations, mean absolute error (MAE) was reported as a robust companion metric, and the coefficient of determination (R
2) was reported relative to a constant-predictor baseline.
Using targets
and predictions
over n test instances, the metrics are
All metrics were computed on the held-out fold and aggregated across the five folds as mean ± standard deviation to capture both expected performance and partition sensitivity in the small-N regime.
For the roughness target, only 28 of 42 sets are labeled; evaluation therefore uses labeled instances in each fold’s held-out partition only, and folds contribute to reported metrics through those labeled test instances.
The main comparative results are summarized in
Table 3; model settings and the locked feature schema are documented in
Table 2 to diagnose systematic bias and heteroscedastic error patterns not visible in scalar metrics alone.
3.4. Uncertainty Estimation Strategy
Predictive uncertainty was estimated using fold ensembles induced by the GroupKFold partitions. For each target, one model per fold was trained using the selected architecture and target-specific feature variant; at inference, each input yields an ensemble of fold-trained predictions with .
The predictive mean was defined as the ensemble mean,
and epistemic spread was summarized using the ensemble standard deviation,
These summaries are deployable in the sense that they require only the trained fold models and do not assume access to additional calibration labels at inference time.
Uncertainty was used in two roles: (i) surrogate-level diagnostics via coverage-style calibration checks (
Figure 3b), where predicted uncertainty is assessed against empirical error behavior across nominal quantiles, and (ii) decision-making in co-design to penalize overly optimistic candidates through a conservative lower-confidence bound (LCB) in Equation (14).
where z controls conservativeness.
The co-design recommendation table (
Table 4) reports mean predictions and uncertainty summaries to keep the ranking and screening logic transparent and auditable.
3.5. Optimizer Ablation Protocol
An optimizer ablation quantified sensitivity of training dynamics and generalization performance to optimizer choice under an identical model architecture, identical feature set, and the same 5-fold GroupKFold protocol by set_id, so that observed differences reflect optimization behavior rather than data partition effects. The study compared AdamW, Muon, and a hybrid strategy in which Muon was applied selectively to a subset of parameters (hidden layers) while AdamW was retained for remaining parameter groups; additional optimizer-ablation settings, learning-rate sweeps, and selection rules are detailed in
Appendix B.3.
The optimizer study was framed as a robustness evaluation for a fixed neural tabular regressor, with identical splits and identical preprocessing across optimizers to isolate optimizer effects. For each target property, training was repeated across multiple random seeds for each optimizer to separate optimizer-induced variance from partition-induced variance. Within each fold, each optimizer used the same initialization scheme, stopping criterion, and maximum training budget; performance was recorded on the held-out fold using RMSE as the primary metric, with MAE and R2 tracked as secondary diagnostics.
Learning-rate operating regions were characterized via grid sweeps for each optimizer. A discrete set of learning rates was evaluated for each target, and the best-performing learning rate was selected based on fold-aggregated validation performance, subject to stability screening (excluding configurations that diverged or produced degenerate predictions).
The resulting score landscapes were summarized, best achievable operating points per optimizer and target were summarized as mean ± standard deviation across runs, and run-to-run dispersion at the selected learning rate was reported via boxplots (
Figure 4a). To avoid confounding from feature-definition differences, the optimizer ablation used a locked feature schema and a fixed preprocessing pipeline, with median imputation and any scaling computed on training-only partitions and applied with frozen parameters to validation/test partitions following the leakage controls in
Section 2.5 and
Section 2.6.
4. Morphology Module: Two-Stage Embedding and Deployable Multimodal Inference
Morphology and microstructure proxies (pore statistics, prior-β grain descriptors, and stress–strain-derived signatures) carry mechanistically relevant information, yet their availability is non-uniform across sets, creating a fundamental train-time-versus-deploy-time information mismatch. A two-stage morphology module is therefore used to separate an oracle regime, in which proxies are observed at inference, from a deployable regime, in which low-dimensional proxy embeddings are learned within training folds and then predicted from process-derived inputs when proxies are absent.
4.1. Oracle Morphology vs. Predicted Morphology Concept
Morphology and microstructure proxies, including pore morphology, prior-β grain morphology, and stress–strain-derived descriptors, are informative but incompletely observed across sets. This creates a practical distinction between a train-time setting in which proxy descriptors are available for supervised learning and a deploy-time setting in which these descriptors may be absent for a new candidate process recipe. The morphology module is therefore formulated as a two-stage mechanism that separates (i) the informational value of morphology-conditioned inference when proxy descriptors are observed from (ii) the feasibility of producing morphology-conditioned predictions when proxy descriptors are missing.
Two inference regimes are considered. In the oracle regime, proxy descriptors are treated as directly observed inputs at inference time. This regime estimates an upper bound on achievable surrogate performance when multimodal measurements are available and is used to quantify the potential benefit of morphology-aware learning. In the deployable regime, proxy descriptors are not assumed to be observed for candidate designs. Instead, proxy embeddings are first predicted from the available tabular/process features, and the predicted embeddings are then used as substitutes for the missing proxy inputs in the final property surrogate. This yields a deployable multimodal inference path that can be applied to co-design candidate generation, where only process parameters and engineered physics features are available.
Formally, let x denote the tabular/process feature vector (including engineered physics features), and let m denote a modality-specific proxy feature block (pores, grains, or stress). The oracle surrogate is defined in Equation (15).
where
denotes concatenation. The deployable two-stage surrogate is defined by Equations (16) and (17).
and a property surrogate predicts the target using the predicted embedding,
This separation makes explicit the train-time versus deploy-time information structure and avoids reporting oracle-only gains as deployable improvements. The comparison between oracle and deployable variants is summarized in the morphology-module results table and is used downstream to select target-specific deployable variants for co-design.
4.2. PCA Embedding Construction on Training Folds
Raw proxy feature blocks are high-dimensional and heterogeneous across modalities (for example, tens of pore summary features, a smaller set of grain descriptors, and a larger stress–strain descriptor set). To obtain compact representations that are easier to predict from tabular inputs and less prone to overfitting under small-sample conditions, each proxy modality is mapped to a low-dimensional embedding using principal component analysis (PCA). PCA fits strictly within each training fold to preserve leakage control, and the learned transformation is then applied to the corresponding held-out fold.
For each modality, a training-fold morphology matrix is constructed by selecting the modality-specific feature columns, coercing to numeric types, and applying training-only imputation. Let
denote the training-fold matrix for fold
. PCA is fit to
, and the number of retained components
is chosen using a cumulative explained-variance criterion with an upper cap to avoid excessive flexibility under small
. Specifically, the smallest
is selected such that cumulative explained variance exceeds 95%, subject to
. The resulting training-fold embedding is
and the held-out embedding is obtained by applying the same transformation to the test-fold matrix
,
This fold-specific fitting ensures that no distributional information from held-out sets influences the embedding basis. Because proxy availability differs by modality and by set, modality-specific usability checks are applied per fold. If a modality has insufficient observed values or degenerate variance after preprocessing in each fold, the modality embedding is treated as unavailable for that fold to avoid unstable PCA solutions. The final embedding dimensionalities used in experiments are therefore reported explicitly (per modality) alongside the downstream property-surrogate results. The PCA embeddings constitute the interface between proxy observation and deployable inference; they serve as the targets for morphology prediction in the deployable pipeline and as conditioning variables for the oracle pipeline. A necessary condition for deployable multimodal inference is that morphology embeddings can be estimated from the available process-derived feature set with non-trivial fidelity [
22,
23,
24]. Embedding predictability is therefore quantified directly by training fold-respecting predictors from tabular/process features to each PCA embedding coordinate and reporting out-of-fold values. This diagnostic is reported at the embedding level, not only at the downstream property level, because a two-stage pipeline can fail either due to weak embedding predictability or due to limited incremental utility of morphology even when predicted accurately.
Let
denote the tabular/process feature vector for set
, and let
denote the PCA embedding for morphology modality
constructed within training fold
(
Section 4.2), with fold-specific PCA fit on the training partition only and then applied to the held-out fold. Because PCA fits within each fold, embedding coordinates are fold-local; coordinate-wise predictability is interpreted as a within-fold diagnostic rather than a globally fixed latent axis.
For each fold k, a predictor
is trained on
and evaluated on
. Predictability is summarized using the coefficient of determination,
These diagnostics enable modality-wise screening for deployable inference; modalities with consistently low embedding predictability indicate limited learnability of the corresponding embeddings from the available process covariates under the present dataset and coverage, implying that oracle-only improvements should not be interpreted as deployable gains. This diagnostic is referenced in the morphology-module results (
Figure 5, morphology module panel(a)) and in the selection of deployable variants used for co-design (
Section 4.4).
4.3. Deployable Two-Stage Pipeline Used in Co-Design
The final co-design workflow requires property prediction for candidate process recipes before any physical build, which precludes direct observation of pores, grains, or stress–strain descriptors [
25,
26,
27]. The deployable surrogate therefore uses a two-stage inference pathway in which morphology information is represented by predicted PCA embeddings derived from process variables.
4.3.1. First-Stage Embedding Prediction
For each morphology modality selected for a given target, a modality-specific predictor is trained to map tabular/process features to the modality embedding,
with
trained under the same GroupKFold protocol as the property surrogates. Training-only preprocessing is applied within each fold, including median imputation and any scaling required by the chosen model class (
Section 2.5). Importantly, the PCA used to define
is fit on the training partition only (
Section 4.2), so the embedding predictor does not access held-out sets beyond strict cross-validation allowances.
4.3.2. Second-Stage Property Prediction
The predicted embeddings are concatenated with the tabular/process features to form the deployable multimodal input, and a target-specific surrogate is trained to predict the mechanical or surface property of interest,
Model selection at this stage is target-specific and constrained to deployable variants, meaning that oracle-only inputs are excluded when choosing the final surrogate used in design-space exploration. When multiple modality combinations are feasible, the best deployable variant is selected per target based on mean cross-validated RMSE (and associated ), with the corresponding configuration recorded for reproducibility.
4.4. Uncertainty Summaries for Design-Time Ranking
Co-design requires not only point predictions but also a notion of predictive uncertainty to avoid brittle recommendations. For each candidate, the ensemble of fold-trained surrogates produces a distribution of predictions; the empirical mean serves as the primary estimate and the empirical standard deviation across the fold models serves as a conservative uncertainty proxy. These summaries are used to implement risk-aware selection and constraint handling in candidate screening, where feasibility is evaluated under uncertainty-aware criteria rather than single-point estimates. This deployable two-stage surrogate is used to generate the final recommendation table for co-design (
Table 4) and to support the Pareto and risk-aware analyses in the co-design section (
Figure 6).
5. Results
Performance evidence is organized to characterize the multimodal set-level learning problem, establish leakage-controlled baselines, and evaluate the deployable value of morphology augmentation through oracle versus predicted embeddings, culminating in uncertainty-aware co-design recommendations.
5.1. Dataset and Feature Characterization
Table 1 summarizes the final set-level dataset used for model development, including the unique set_id groups (1–42), GroupKFold fold membership, and target-specific label availability. The dataset is intrinsically multimodal at the set level, combining (i) L-PBF process parameters and engineered physics features and (ii) morphology and microstructure proxies derived from pore statistics, prior-β grain morphology, and stress–strain descriptors where available.
Figure 2 provides a compact characterization of the learning problem.
Figure 2a depicts the set-level unit of analysis and the partitioning of inputs into process/physics features and morphology-derived blocks alongside the six targets.
Figure 2b visualizes the coverage of the process parameter space and engineered energy-input descriptors, documenting the observed operating ranges represented in the set-level table.
Figure 2c reports the empirical target distributions (yield strength, ultimate tensile strength, elongation, elastic modulus, surface roughness, microhardness), highlighting differences in scale, dispersion, and tail behavior that motivate target-wise treatment and inform interpretation of error magnitudes.
Figure 2 provides a compact characterization of the learning problem with explicit separation between outputs and inputs.
Figure 2a shows the empirical distributions of the six set-level response variables.
Figure 2b summarizes representative input-feature distributions; importantly, this panel combines the raw process variables retained in the present dataset with deterministic engineered descriptors derived from them and selected proxy features used by the multimodal surrogate.
Figure 2c–f then characterize the input space further through modality-wise missingness, output–input correlation structure, modality coverage across set_id, and representative morphology-feature distributions. This organization is intended to distinguish clearly between the predicted outputs and the heterogeneous input blocks used in the surrogate framework.
Target completeness differs by endpoint; in particular, roughness has partial label availability relative to mechanical properties, and labeled-subset evaluation is used for targets with incomplete labeling. This target-dependent effective sample size is a dataset property that must be carried forward when comparing results across targets.
5.2. Baseline Performance Comparison
Baseline regressors are evaluated under the fixed 5-fold GroupKFold protocol to establish a CPU-only reference for each target.
Table 3 reports fold-aggregated performance (mean ± standard deviation) using RMSE as the primary metric, with MAE and R
2 reported as complementary diagnostics, and records the best-performing baseline per target under this protocol. Across the six targets, baseline results indicate that strong nonparametric learners provide competitive accuracy under small-n conditions, while target difficulty differs substantially. In particular, the baselines exhibit robust performance for strength and ductility targets (yield strength, UTS, elongation) and elastic modulus, whereas surface roughness and microhardness show weaker baseline performance.
This contrast is consistent with the combination of target distributional characteristics and the limited observability of morphology-dependent information in tabular-only features, especially under partial modality coverage. For partially labeled targets, reported headline values are computed across the subset of folds that contribute labeled test instances, and this reporting convention is maintained to preserve auditability under target-dependent completeness.
Table 3 provides the target-wise, fold-aggregated error statistics computed under the fixed 5-fold GroupKFold protocol, thereby enabling direct comparison across endpoints under a common leakage-controlled evaluation design. Consistent with target-dependent completeness, partially labeled endpoints are evaluated on the labeled subset of held-out sets rather than by imputing labels, preserving the auditability of reported metrics. Under this protocol, strength and ductility targets exhibit substantially higher explained variance than surface roughness and hardness, motivating subsequent sections that test whether additional morphology/proxy information can produce systematic gains beyond process/physics descriptors alone.
5.3. Final Hybrid Surrogate Performance, Parity Plots, and Global Feature Importance
Table 2 documents the final surrogate family and locked training settings used for the reported models, including the CPU-only implementation, the feature-construction variant adopted per target, and the final hyperparameter choices used in cross-validation.
Table 3 reports the corresponding predictive performance (RMSE, MAE, and R
2) aggregated across the five GroupKFold splits, enabling direct comparison to the tabular-only baselines under an identical evaluation protocol.
Figure 3 consolidates the primary performance and interpretability evidence for the final hybrid surrogate.
Figure 3a documents the GroupKFold evaluation protocol and the placement of model selection within each training fold, while
Figure 3a summarizes target-wise performance and fold-to-fold spread using the same metric definitions reported in
Table 3. This organization supports a direct assessment of whether incorporating morphology-derived information yields target-specific gains beyond process/physics features alone, rather than attributing improvements to procedural differences. Parity plots in
Figure 3 provide an essential diagnostic complement to scalar metrics in the small-n regime. Predicted values are plotted against measured values for held-out folds, with the identity line used to assess systematic bias. The fold-aggregated error statistics shown in the panel insets correspond to the values tabulated in
Table 3.
This parity-based view supports inspection of regime-dependent error structure (including outliers and tail behavior evident in the target distributions) that can be obscured by aggregate RMSE alone, and it provides a fold-respecting check against overfitting through consistency of held-out behavior; additional residual-based, modality-sensitivity, and robustness checks are provided in
Appendix B.5. Global feature attribution for the final surrogate is summarized in
Figure 3c. The ranked importances are drawn from the final gradient-boosted model family and each feature is associated with its modality block (process/physics versus morphology-derived descriptors).
This analysis is used as a mechanistic plausibility audit verifying that physically motivated energy-input descriptors and morphology summaries appear among dominant drivers, while remaining explicitly non-causal in interpretation. The feature-importance view is therefore treated as a global diagnostic to contextualize hybrid performance and to motivate the morphology-module analyses that follow.
5.4. Optimizer Stability and Learning Dynamics
Figure 4 evaluates the effect of optimizer choice on training stability and convergence behavior in the low-data regime, using the fixed experimental protocol defined in
Section 3.5.
The primary objective of this analysis is robustness: determining whether reported surrogate accuracy is attributable to modeling choices rather than idiosyncratic training dynamics.
Figure 4a reports fold-aggregated validation trajectories at the selected operating point for each optimizer and target, shown as the median validation loss across GroupKFold splits with interquartile bands. To avoid overinterpreting late-epoch variance after early stopping, only epochs with full 5-fold support are displayed. Under this aggregation, the dominant reduction in validation loss occurs during the early training phase for all targets, while the remaining late-stage variability is modest and is more pronounced for UTS and elongation than for yield. This pattern should not be interpreted as evidence of a distinct second convergence phase. Rather, it is consistent with the greater sensitivity of UTS and elongation to partially observed post-yield microstructural state variables, including α/β phase balance, α/α′ lath morphology, retained β, texture, grain-boundary α, residual stress, and lattice-defect/dislocation state. Accordingly, optimizer selection is based primarily on the fold-aggregated learning-rate operating regions and best-operating-point summaries in
Figure 4b,c, rather than on the terminal value of any single trajectory.
Figure 4a reports stability distributions across repeated runs as boxplots of the validation metric for each optimizer variant, directly diagnosing sensitivity to initialization and run-to-run variability. Learning-rate operating regions are mapped as target-by-learning-rate heatmaps for each optimizer, identifying stable windows and revealing whether tuning tolerance is unusually narrow.
Best operating points are summarized for each optimizer–target pair by reporting the top mean performance with its variability, annotated with the corresponding selected learning rate. The optimizer ablation is used as a sensitivity analysis supporting reproducibility of the reported surrogate results and as justification for the final default optimizer and learning-rate setting adopted in
Table 2.
Consistent with the broader leakage controls, the optimizer study is conducted on a locked feature schema and fixed preprocessing pipeline, with median imputation and any required scaling computed on training-only partitions and applied to held-out partitions using frozen parameters.
5.5. Morphology Module Outcomes: Oracle vs. Predicted Embeddings and Their Impact
Figure 5 reports the outcomes of the two-stage morphology module designed to separate training-time information from deploy-time feasibility. The central comparison is between oracle morphology embeddings, where morphology descriptors are computed directly from measured pore and grain summaries and then embedded within each training fold, and predicted morphology embeddings, where the same embedding coordinates are inferred from process and engineered physics features alone. This distinction is critical for deployment, because morphology measurements are not guaranteed to be available for prospective designs, while process parameters and engineered physics features are always available.
The morphology module and the two-stage training logic are introduced, clearly distinguishing the oracle pathway from the deployable predicted pathway.
Figure 5a reports the embedding predictability diagnostics, quantified by fold-wise R
2 between oracle embedding coordinates and their process-predicted counterparts; this panel establishes which morphology modalities can be reconstructed with meaningful fidelity from process variables and which remain weakly predictable.
Figure 5b evaluates the downstream impact on property prediction by comparing tabular-only surrogates to hybrid surrogates augmented with oracle embeddings and with predicted embeddings, using the same GroupKFold protocol as in
Figure 3. Improvements attributable to oracle embeddings bound the maximum achievable gain from morphology information under perfect availability, while improvements from predicted embeddings quantify what is realizable under deployable conditions.
Figure 5c summarizes the resulting target-dependent pattern, highlighting cases where morphology information adds measurable value and cases where performance is dominated by tabular process and engineered features.
5.6. Co-Design Recommendations Under Constraints and Uncertainty
Table 4 reports the constrained co-design recommendations produced by the deployable surrogate stack, including the selected process parameter sets, predicted property means, and uncertainty summaries used for risk-aware ranking. Recommendations are generated from a candidate pool sampled within the observed process envelope and evaluated using fold ensembles to obtain both central tendency and dispersion estimates for each target. Constraint handling is performed by screening candidates against minimum performance thresholds and by ranking feasible solutions with uncertainty-aware criteria, ensuring that reported designs reflect both expected performance and predictive confidence.
Predicted targets are reported as fold-ensemble mean ± standard deviation (GroupKFold-safe fold ensembles). Yield LCB and Roughness UCB correspond to the conservative confidence bounds used in the uncertainty-aware selection analysis. Designs are ordered by the composite score used for ranking feasible candidates.
Figure 6 visualizes the co-design outcomes and the logic of recommendation selection.
Figure 6a presents the Pareto front for the primary tradeoff axis, showing the candidate cloud, Pareto-optimal subset, and highlighted top recommendations.
Figure 6b reports constraint satisfaction rates, comparing all candidates to Pareto-optimal and top-ranked subsets to demonstrate the tightening effect of constraints and ranking.
Figure 6c provides the validation view where measured outcomes are available, plotting predicted versus measured performance for the evaluated designs with identity reference and summary metrics.
Figure 6d summarizes the recommended recipe set in an interpretable parameter-space view, enabling rapid comparison of power, scan speed, hatch spacing, layer thickness, and derived energy descriptors across top candidates.
Figure 6e reports the risk-aware selection analysis, contrasting mean performance with uncertainty or lower-confidence bounds to justify the final recommended subset.
Figure 6f presents a local robustness analysis around the selected recipe, quantifying sensitivity to small perturbations in key process variables and indicating whether recommendations lie in locally stable regions rather than fragile optima.
6. Discussion, Conclusions and Data Availability
Interpretation is cast in a hierarchical process–structure–property paradigm, in which process settings and engineered energy-input descriptors index thermal history, with defect and microstructure proxies mediating the observed mechanical and surface responses [
28,
29,
30].
The oracle–deployable comparison is used to delineate morphology information that is recoverable from process variables from irreducible signal that warrants targeted characterization, while uncertainty-aware selection is emphasized to avoid brittle optima under sparse and heterogeneous measurement coverage.
6.1. Materials-Science Interpretation of Dominant Drivers
The results are consistent with a hierarchical process–structure–property framing in which L-PBF process settings and engineered energy-input descriptors act as first-order proxies for the thermal histories that shape defect formation and microstructural evolution, which then mediate macroscopic properties. Within this interpretation, controllable inputs (laser power, scan speed, hatch spacing, layer thickness) together with linear energy density and volumetric energy-density variants provide a compact description of nominal energy input that is physically consistent with established links to melt-pool behavior and solidification conditions, without implying direct measurement of melt-pool stability or thermal gradients. These conditions are reflected downstream through pore population summaries (size, volume fraction, morphology) and prior-β grain morphology (aspect ratio and characteristic length scales), while stress–strain-derived descriptors serve as morphology-adjacent signatures of the combined imprint of microstructure, defects, and residual stress state.
This hierarchy is aligned with the target-dependent value of morphology-derived information observed in the modeling outcomes. Defect-sensitive properties, particularly yield strength and fatigue-relevant surface proxies, are expected to respond to pore volume statistics and sphericity distributions because pores act as stress concentrators and reduce the effective load-bearing area. Where the morphology module indicates usable embedding predictability, the predicted-embedding pathway suggests that parts of this defect-related signal are recoverable from process variables in the present dataset, consistent with the role of energy input and scan strategy in controlling lack-of-fusion and keyhole regimes. More modest or target-specific gains from morphology augmentation should not be interpreted as evidence that the prior-β grain descriptors are irrelevant; rather, the grain-summary block captures only one part of the latent microstructural state, whereas the stress–strain-derived block reflects the integrated response of prior-β morphology together with pores, phase constitution, texture, residual stress, and lattice-defect/dislocation substructure, so only partial correlations between the two are expected.
6.2. What the Multimodal Gains Mean for Nanomaterials-Enabled Functional Surfaces and AM Process Tuning
The multimodal outcomes carry two implications for functional surfaces and process tuning under the present feature and label regime [
31,
32,
33,
34]. First, the oracle-versus-deployable comparison provides an explicit bound on how much morphology information can improve property prediction when morphology is fully available versus when it must be inferred from deployable inputs. Oracle morphology embeddings represent the idealized upper bound in which pore and grain descriptors are available at inference time, whereas predicted morphology embeddings correspond to the deployable setting in which morphology is inferred from process settings. The gap between these conditions quantifies how much of the morphology signal is structurally learnable from process variables alone in the current dataset; a small gap is consistent with treating morphology as an implicit latent state recoverable from process descriptors for prospective screening, whereas a large gap indicates irreducible morphology information not encoded in process variables and therefore motivates targeted characterization of a minimal morphology panel that most improves decision-making.
Second, for surface-facing objectives as represented here by surface condition constraints and roughness-linked proxies, the co-design and risk-aware selection workflow supports joint tuning of bulk mechanical objectives and surface-relevant objectives under uncertainty. Even when direct surface or bio-proxy measurements are sparse, uncertainty-aware ranking reduces the likelihood of selecting candidates that appear optimal only due to model variance. The constraint-satisfaction analysis and Pareto visualization formalize tradeoffs central to functional implants and engineered interfaces, where high strength must coexist with acceptable roughness and surface condition constraints [
35,
36,
37]. Practically, the workflow identifies process windows expected to deliver compliant bulk properties while maintaining surface-relevant bounds and highlights regimes where predicted uncertainty is high and additional experiments would be maximally informative.
6.3. Conclusions
This work presents a deployment-oriented surrogate modeling framework for L-PBF Ti-6Al-4V that explicitly reflects the constraints of small, set-level experimental datasets and partially observed morphology descriptors. By enforcing recipe-level grouping (GroupKFold by set_id) and fold-respecting preprocessing, the study provides a leakage-resistant basis for comparing process-only and multimodal predictors.
A key contribution is the principled separation of oracle multimodal inference (morphology available at decision time) from a deployable setting in which morphology must be inferred from process-accessible inputs via a two-stage embedding strategy. Across targets, the resulting models achieve strong performance for primary mechanical properties while underscoring that roughness and hardness remain difficult, likely reflecting label sparsity and missing or weakly captured surface-state determinants. Finally, the framework operationalizes uncertainty through fold-ensemble variability and integrates this signal into a constraint-aware co-design workflow, enabling conservative screening and recommendation of candidate recipes within the feasible process envelope. Collectively, the study advances a reproducible and practically actionable template for data-scarce AM optimization, bridging multimodal learning and decision-making under uncertainty. More broadly, the present framework is consistent with recent efforts in other safety-critical engineering domains to combine physically informed modeling, hybrid learning, and calibrated uncertainty estimation to improve deployment trustworthiness [
37].
Although developed here for L-PBF Ti-6Al-4V, the framework is general to other process–structure–property systems with small, partially observed multimodal datasets. Extension to a new alloy or manufacturing route would require redefinition of the relevant process descriptors, morphology/microstructure proxies, and targets, followed by system-specific retraining and prospective validation of the surrogate and embedding models.
6.4. Data and Code Availability
The base dataset underpinning this study is publicly available and can be obtained from Zenodo (record 6587905):
https://zenodo.org/records/6587905 (accessed on 31 March 2026) [
38]. Derived datasets and research artifacts produced in the course of this work, including curated set-level master tables, engineered physics features (e.g., line energy density and energy-density variants), morphology and microstructure summary features (pore and prior-β grain descriptors), stress–strain-derived descriptors, finalized feature filters, GroupKFold split definitions, trained-model outputs (fold-level predictions), and aggregated performance summaries, have not been deposited in a public repository. These derived materials are available from the corresponding author upon reasonable request and subject to any applicable data-use, privacy, or third-party restrictions.
The code used to generate the results reported in this study is not publicly available at this time, owing to practical constraints related to project-specific dependencies, environment configuration, and associated research artifacts. However, the code can be made available by the corresponding author upon reasonable request, subject to any applicable institutional, licensing, and third-party restrictions.