1. Introduction
Polymer matrix composites (PMCs) are widely employed in structural and multifunctional applications owing to their versatility, tunable architectures, and capacity to integrate multiple functionalities within a single material platform. Beyond conventional fiber-reinforced PMCs, increasing research attention has focused on porous and cellular polymer composites, which exhibit enhanced mass transport, large interfacial areas, and reduced density. These attributes render them particularly attractive for applications including adsorption, separation, catalysis, sensing, and environmental remediation [
1,
2,
3].
Within this broader class, polymers templated from high internal phase emulsions (polyHIPEs) constitute a powerful family of macroporous PMCs, as their pore size, window size, interconnectivity, and overall openness can be systematically tailored through formulation and processing parameters [
4,
5,
6,
7]. As a result, polyHIPE-based composites have been extensively investigated as functional materials in which performance is governed not only by polymer chemistry but also by the hierarchical organization of the porous network.
Structural descriptors such as pore diameter, pore–window connectivity, degree of openness, and specific surface area play a central role in governing transport phenomena and interfacial interactions during adsorption and separation processes [
6,
7,
8,
9]. The incorporation of functional fillers further expands the design space of polyHIPEs, enabling the development of multifunctional macroporous PMCs. In particular, embedding magnetic nanoparticles within polymeric porous matrices allows adsorption processes to be coupled with magnetic separability, thereby facilitating material recovery, reuse, and process intensification in aqueous systems [
10,
11,
12].
A growing body of experimental literature has reported magnetic polyHIPE-based composites employing comparable formulation strategies, porous architectures, and functional objectives. Among these studies, the work of Vallejo-Macías et al. [
13] provides a representative and experimentally well-documented example, describing macroporous polyacrylamide polyHIPE composites coated with maghemite (γ-Fe
2O
3) nanoparticles for methylene blue removal. In that study, polyHIPE monoliths were synthesized using either conventional hydrophobic oils (tetradecane) or hydrophobic deep eutectic solvents (DESs) as the internal phase, followed by in situ coprecipitation of γ-Fe
2O
3 nanoparticles. Variations in internal phase chemistry and surfactant content produced pronounced changes in pore morphology, degree of openness, BET specific surface area, magnetic nanoparticle loading, and adsorption performance. Importantly, the reported results demonstrate that adsorption efficiency emerges from a coupled interplay between structure, composition, and surface accessibility, involving nonlinear interactions among multiple descriptors rather than monotonic trends with any single variable [
13].
Such coupled behavior is not unique to a single formulation or study but recurs across polyHIPE-based magnetic composites reported in the literature. Nevertheless, extracting quantitative and transferable structure–property–performance relationships from these dispersed experimental datasets remains challenging. Functional performance is typically discussed through qualitative comparisons or limited parametric trends, which are insufficient to resolve nonlinear interactions or trade-offs between competing design objectives [
14,
15]. This challenge is further exacerbated by the inherently low experimental throughput of polyHIPE synthesis and characterization, resulting in small but information-dense datasets that fall squarely within the small-data regime typical of advanced composite fabrication.
To facilitate interpretation of these coupled relationships,
Figure 1 schematically summarizes the flow of information from formulation and processing variables to porous morphology, physicochemical properties, and functional performance, highlighting the multivariate mapping targeted by the machine-learning framework.
Machine learning (ML) provides a powerful and increasingly adopted framework for modeling complex, high-dimensional relationships in polymers and composite materials [
16,
17,
18]. However, the effective application of ML in small-data settings requires careful algorithm selection and explicit strategies to mitigate overfitting. Tree-based ensemble methods, such as Random Forest and Gradient Boosting, are particularly well suited to this regime because they can capture nonlinear interactions, accommodate mixed descriptor types, and provide inherent regularization through ensembling and feature subsampling [
19,
20,
21]. When combined with repeated cross-validation, uncertainty estimation, and interpretable diagnostics, these models enable robust learning from limited experimental data while preserving physically meaningful insight.
Accordingly, interpretable ML approaches—including permutation importance, SHAP (Shapley additive explanations), and partial dependence analysis—are increasingly adopted to move beyond black-box prediction toward explanatory, association-based understanding and rational materials design [
22,
23,
24]. Deep learning architectures were intentionally excluded from the present study due to their substantial data requirements and limited interpretability under the dataset sizes available here.
Despite these advances, most ML studies on polyHIPE-based composites remain confined to single experimental datasets, which limits assessment of the robustness and transferability of learned structure–property–performance relationships. In this context, cross-study analysis offers an important opportunity to evaluate whether such relationships persist across independently reported, yet chemically and morphologically compatible, material systems. To support interpretation of these coupled relationships,
Figure 1 schematically summarizes the flow of information from formulation and processing variables to porous morphology, physicochemical properties, and functional performance, highlighting the multivariate mapping targeted by the machine-learning framework under explicit weak-causality assumptions.
Beyond the specific material systems analyzed, the methodological contribution of this work lies in its explicit emphasis on cross-study robustness and interpretability within a small-data materials modeling context. Unlike prior ML-assisted studies of adsorption or composite performance that rely on single experimental datasets or prioritize predictive accuracy alone, the present framework (i) curates and harmonizes compatible experimental data across multiple independent studies, (ii) explicitly evaluates both intra-study and cross-study generalization to assess transferability rather than dataset-specific fitting, (iii) integrates uncertainty-aware validation strategies to mitigate optimism bias inherent to small datasets, and (iv) prioritizes interpretable diagnostics to translate associative ML outputs into physically plausible design insight. In addition, by selecting a direct engineering performance metric—removal efficiency (%) under fixed operating conditions—rather than fitted isotherm parameters, the framework emphasizes practical comparability and reduces sensitivity to model-dependent fitting artifacts. Collectively, these elements distinguish the present study from existing ML-assisted adsorption analyses and position it as a transparent and reproducible blueprint for extracting structure–property–performance insight from dispersed experimental literature on multifunctional polymer matrix composites.
While extrapolative machine-learning approaches have recently been proposed for materials design—particularly in metallic systems where controlled experimental validation enables prediction beyond the training domain [
25]—such strategies remain challenging for heterogeneous, literature-derived datasets typical of porous polymer matrix composites. Accordingly, the present study focuses on interpretable, uncertainty-aware modeling within a constrained experimental space. Specifically, we develop an interpretable cross-study machine-learning framework to analyze structure–property–performance relationships in macroporous polyHIPE-based magnetic polymer composites by integrating experimental data curated from multiple independent studies employing comparable formulation and characterization strategies, including the work of Vallejo-Macías et al. [
13]. No new synthesis or experimental measurements are performed.
To maximize practical relevance and comparability across studies, a direct performance metric—removal efficiency (%) evaluated at fixed pH, contact time, and initial dye concentration—is adopted as the primary modeling target. Unlike fitted isotherm parameters, which depend on model choice and regression assumptions, removal efficiency represents a direct engineering outcome that is less susceptible to fitting artifacts while remaining immediately relevant for process design and material screening. By combining formulation variables, quantitative morphological descriptors derived from image analysis, and physicochemical properties, multiple ML models are evaluated under both intra-study and cross-study validation schemes with explicit consideration of prediction uncertainty. Beyond predictive accuracy, the analysis identifies consistent structural and compositional drivers of adsorption performance and elucidates trade-offs with magnetic functionality, demonstrating how interpretable ML can extract transferable design insight from dispersed experimental literature and support rational design of multifunctional polymer matrix composites.
2. Materials and Data Sources
The present study is based exclusively on previously published experimental data, and no new synthesis, characterization, or adsorption experiments were performed. All data were curated from multiple independent literature sources reporting macroporous polyHIPE-based magnetic polymer composites with comparable formulation strategies, porous architectures, and adsorption testing conditions. Collectively, these sources define a chemically and morphologically compatible dataset suitable for cross-study machine-learning analysis.
Among the curated studies, the work of Vallejo-Macías et al. [
13] serves as a reference and anchor dataset, owing to its comprehensive reporting of formulation parameters, quantitative morphological descriptors, physicochemical properties, and adsorption performance metrics. Additional studies were selected based on the use of analogous polyHIPE synthesis routes, incorporation of magnetic iron-oxide nanoparticles, and evaluation of dye removal performance under controlled aqueous conditions. Where necessary, reported variables were harmonized to ensure consistency in descriptor definitions, measurement conventions, and units across studies.
By restricting the analysis to experimentally comparable systems and adopting standardized performance metrics, the assembled dataset enables both intra-study and cross-study validation of the machine-learning models, while remaining within the small-data regime characteristic of advanced porous polymer composite research.
2.1. Source Material System
Across the curated literature, macroporous polymer matrix composites are predominantly synthesized via high internal phase emulsion (HIPE) templating, followed by polymerization to generate open-cellular polyHIPE structures [
4,
5,
6]. Within these systems, the resulting porous architecture is strongly governed by emulsion formulation variables, enabling systematic control over pore size, window size, interconnectivity, and degree of openness. Among the selected sources, the study by Vallejo-Macías et al. [
13] represents a well-documented reference system, reporting macroporous polyacrylamide monoliths prepared via HIPE templating under controlled polymerization conditions.
Within the assembled dataset, two primary classes of internal phases are reported: conventional hydrophobic oils (e.g., tetradecane) and hydrophobic deep eutectic solvents (DESs) based on D,L-menthol and organic acids. The inclusion of both DES-based and conventional oil internal phases enables comparative assessment of internal phase chemistry with respect to emulsion stability, porous morphology, and downstream functional performance [
13]. Across the source studies, surfactant content was systematically varied to modulate emulsion stability and pore architecture, while polymerization conditions were generally maintained constant within each individual study in order to isolate formulation-driven effects.
Following polyHIPE formation, the porous monoliths were functionalized with magnetic iron-oxide nanoparticles, most commonly maghemite (γ-Fe
2O
3), using in situ coprecipitation or closely related deposition strategies. This functionalization approach yields magnetic macroporous polymer matrix composites, enabling adsorption processes to be coupled with magnetic separability. Such strategies are consistent with established reports on magnetic porous polymer composites developed for environmental remediation and adsorption-based applications [
10,
11,
12]. Taken together, the combination of HIPE-derived macroporosity and magnetic functionalization defines a chemically and structurally compatible material class suitable for cross-study machine-learning analysis.
2.2. Study Inclusion Criteria and Dataset Compatibility
To ensure methodological consistency and to minimize spurious variability in the cross-study machine-learning analysis, explicit quantitative and qualitative inclusion criteria were applied to all literature-derived data. Only studies reporting macroporous polyHIPE-based polymer matrix composites with comparable chemistry, porous architecture, characterization protocols, and performance metrics were deemed eligible for inclusion.
At the chemical and structural level, selected studies were required to employ polymer matrices synthesized via high internal phase emulsion (HIPE) templating, yielding open-cellular macroporous architectures. Systems based on fundamentally different porogens, foaming mechanisms, or non-HIPE-derived porous polymers were excluded to avoid conflating distinct structure-formation pathways. In addition, magnetic functionalization was restricted to iron-oxide-based nanoparticles (e.g., γ-Fe2O3 or Fe3O4) introduced either in situ or post-synthesis, thereby ensuring comparability of magnetic response and surface chemistry across studies.
At the morphological characterization level, included studies were required to report quantitative scanning electron microscopy (SEM)-based descriptors, including at a minimum the mean pore size and a metric reflecting pore connectivity or degree of openness derived from image analysis. Studies providing only qualitative micrographs without extractable numerical descriptors were excluded, as such data do not support multivariate modeling or interpretable machine-learning analysis.
At the functional performance level, adsorption behavior had to be reported using a directly comparable engineering metric, specifically removal efficiency (%) measured under fixed operating conditions with explicit specification of pH, contact time, and initial solute concentration. Studies reporting only fitted isotherm parameters or kinetic constants without a corresponding efficiency metric were excluded, as these parameters depend strongly on model assumptions and fitting procedures and are therefore not directly comparable across studies.
Collectively, these criteria define a closed and internally consistent experimental domain within which machine learning is applied to extract associative structure–property–performance relationships. Studies failing to meet one or more criteria were excluded, not due to a lack of scientific merit, but because their inclusion would introduce uncontrolled heterogeneity that cannot be reliably disentangled in a small-data regime.
Importantly, the potential impact of excluded studies is addressed by explicitly limiting all interpretations and conclusions to the defined compatibility domain. Accordingly, the present analysis does not claim universality across all porous adsorbents or polymer composites, but rather focuses on identifying robust and transferable trends within a well-defined class of macroporous polyHIPE-based magnetic polymer matrix composites.
Table 1 summarizes the quantitative criteria used to define study compatibility as well as the datasets incorporated into the analysis. Notably, not all compatible datasets were pooled for final model training. The macroporous polyacrylamide–γ-Fe
2O
3 system was retained as the primary training domain, while additional studies were selectively employed for cross-study validation, robustness assessment, and sensitivity analyses, thereby avoiding the introduction of uncontrolled functional heterogeneity into the core predictive models.
2.3. Structural and Physicochemical Characterization Data
The dataset assembled for the machine-learning analysis comprises quantitative descriptors spanning formulation variables, porous morphology, and physicochemical properties, as reported across the curated literature on macroporous polyHIPE-based magnetic polymer composites, including the reference study by Vallejo-Macías et al. [
13]. To ensure cross-study consistency, only descriptors that were reported in a comparable, numerical, and operationally well-defined manner across multiple independent sources were retained for model development.
Porous morphology was primarily characterized using scanning electron microscopy (SEM) combined with image analysis, yielding quantitative metrics such as mean pore size, pore window size, and degree of openness. The degree of openness was determined following established geometric and statistical approaches widely adopted in the polyHIPE literature, which relate pore and window size distributions to network connectivity and accessible porosity [
6,
7]. Where necessary, reported values were normalized to ensure consistency in units, definitions, and numerical scales across studies.
Physicochemical properties relevant to adsorption performance were obtained from Brunauer–Emmett–Teller (BET) specific surface area measurements and from thermogravimetric or compositional analyses used to estimate magnetic iron-oxide (γ-Fe
2O
3) nanoparticle loading [
13]. These descriptors represent emergent material properties arising from the combined effects of formulation parameters and porous architecture, rather than independently tunable experimental control variables. As such, they provide a critical intermediate link between processing-induced structure and functional performance within the machine-learning framework.
When compiling SEM-based morphological descriptors from multiple literature sources, heterogeneity in image acquisition and analysis protocols must be explicitly acknowledged. Differences in SEM magnification, contrast settings, thresholding strategies, and segmentation algorithms can introduce systematic variability in reported pore size, window size, and openness metrics, even for nominally similar porous structures.
In the present study, no attempt was made to reprocess or re-threshold original SEM micrographs, as raw images and segmentation parameters were not consistently available across the source studies. Instead, the reported numerical descriptors were treated as study-level measurements subject to protocol-dependent uncertainty, rather than as exact geometric quantities. This treatment is consistent with established practice in cross-study data synthesis, in which published quantitative metrics are accepted as the most reliable representations available within each experimental context, and their inherent uncertainty is addressed during modeling and interpretation rather than through retrospective image reanalysis.
2.4. Functional Performance Metrics
Adsorption performance across the curated studies was evaluated using methylene blue (MB) as a model organic dye under aqueous conditions, in accordance with widely adopted protocols for assessing adsorption behavior in porous polymer matrix composites [
13]. For the purposes of the present cross-study machine-learning analysis, a direct and experimentally comparable engineering metric—removal efficiency (%) measured at fixed pH, contact time, and initial dye concentration (C
0)—was selected as the primary modeling target.
The selection of removal efficiency as the target variable avoids ambiguities associated with fitted isotherm parameters (e.g., Langmuir or Freundlich constants), which depend strongly on model choice, fitting range, and regression methodology and are therefore difficult to compare across independent studies. In contrast, removal efficiency represents a direct outcome of material performance, integrating the combined effects of adsorption capacity, accessibility of active sites, and transport limitations. As such, it provides a more robust and practically meaningful basis for cross-study machine-learning analysis and materials screening.
In addition to adsorption performance, magnetic saturation (M
s) values reported for γ-Fe
2O
3-functionalized polyHIPE composites were included as a secondary performance descriptor to capture multifunctional behavior related to magnetic recovery and separability [
13]. The joint consideration of removal efficiency and magnetic saturation enables systematic analysis of potential trade-offs between adsorption effectiveness and magnetic functionality, which are central to the practical deployment of magnetic macroporous polymer composites.
The choice of removal efficiency (%) measured under fixed pH, contact time, and initial solute concentration reflects a deliberate trade-off between physical completeness and cross-study comparability. While adsorption kinetics and isotherm parameters provide valuable mechanistic insight, these quantities are typically derived from model-dependent fitting procedures and experimental protocols that vary substantially across studies. Consequently, direct comparison of such fitted parameters can introduce systematic bias unrelated to intrinsic material performance.
By contrast, removal efficiency measured under explicitly defined operating conditions represents a direct engineering response that aggregates kinetic, equilibrium, and transport effects into a single observable metric. This characteristic makes it particularly well suited for cross-study data integration and machine-learning-based screening, provided that the relevant experimental conditions are consistently reported and controlled.
2.5. Data Curation and Scope
All variables used in the machine-learning models were extracted directly from published figures, tables, and
Supplementary Materials reported in the curated literature sources, including the reference study by Vallejo-Macías et al. [
13], thereby ensuring full traceability and reproducibility. When numerical values were not explicitly tabulated, data were digitized from graphical representations using consistent procedures and cross-checked against reported trends to minimize transcription errors. Descriptor definitions, units, and naming conventions were harmonized across studies prior to model construction to ensure cross-study consistency.
In cases where quantitative values were extracted from published graphs, data digitization was performed using standard graphical extraction tools, following repeated independent digitization passes to reduce operator bias. For each digitized data point, multiple extraction passes were conducted and the median value was retained for modeling, while the observed dispersion among repeated extractions was treated as an estimate of digitization-related uncertainty. This procedure is consistent with widely accepted practices for secondary use of literature data and explicitly acknowledges that digitized values carry greater uncertainty than directly reported numerical measurements.
Experimental conditions related to hydrodynamics—such as mixing speed, agitation mode, particle size distribution, and reactor geometry—were not consistently reported across the source studies and therefore could not be included as explicit descriptors in the machine-learning models. These factors are known to influence apparent removal efficiency by affecting external mass transfer and contact dynamics. Within the present framework, such effects are treated as latent or unobserved variables that contribute to residual variability in the performance data rather than as controllable inputs. This treatment is aligned with the objective of extracting structure–property–performance associations that are robust across heterogeneous experimental implementations, rather than modeling reactor-scale or process-specific phenomena.
The resulting dataset represents a small-data regime characteristic of advanced porous composite fabrication and low-throughput materials synthesis, while still encompassing a diverse and interrelated set of descriptors spanning formulation variables, porous morphology, physicochemical properties, and functional performance. This balance enables meaningful multivariate analysis while preserving physical interpretability.
To account for heterogeneity in SEM image acquisition and analysis protocols across source studies, morphological descriptors were interpreted as noisy but informative variables that capture relative structural differences rather than exact geometric precision. Variability arising from differences in magnification, thresholding strategies, and segmentation procedures was therefore treated as an additional source of epistemic uncertainty intrinsic to literature-derived datasets.
Rather than attempting ad hoc normalization across studies—which could introduce further bias or artificial alignment—this uncertainty was implicitly accommodated through conservative model selection, cross-study validation strategies, and uncertainty-aware interpretation of results. In this context, machine learning is employed to identify robust associative trends that persist despite measurement heterogeneity, rather than to infer precise quantitative structure–property mappings.
Consistent with the objectives of the present cross-study analysis, the assembled dataset was treated as a closed experimental domain. Accordingly, model predictions and interpretability analyses are constrained to the range of compositions, structures, and test conditions represented in the underlying literature. Machine learning is therefore used as a tool for data-driven analysis, trend extraction, and insight generation, rather than for extrapolative prediction beyond the available experimental space. This conservative framing ensures that the conclusions drawn remain physically grounded and directly supported by existing experimental evidence.
3. Feature Engineering
Feature engineering was conducted to construct a compact, physically meaningful, and cross-study compatible set of descriptors linking formulation parameters, porous morphology, physicochemical properties, and functional performance, in accordance with the data-flow framework illustrated in
Figure 1. Given the limited size and heterogeneous provenance of the assembled dataset, particular emphasis was placed on minimizing feature redundancy, preserving physical interpretability, and ensuring that all selected descriptors could be consistently defined and extracted across independent literature sources.
To prevent information leakage and avoid trivial encoding of the target response, variables that directly or implicitly embed adsorption outcomes were excluded from the feature set. Instead, descriptors were selected to represent upstream, causally proximal factors—such as formulation choices and structural attributes—that precede functional performance within the materials processing chain. This strategy enables the machine-learning models to learn nonlinear associative mappings between design-relevant inputs and downstream performance metrics, rather than memorizing outcome-specific correlations.
Continuous descriptors were retained in their original physical units whenever possible to preserve interpretability and facilitate physically grounded reasoning. Categorical formulation variables (e.g., internal phase type) were encoded using simple binary representations, thereby avoiding the introduction of artificial ordinal relationships. No dimensionality-reduction techniques were applied, as such transformations would obscure the physical meaning of individual descriptors and complicate interpretation in the small-data regime.
Overall, the resulting feature set reflects a deliberate balance between model expressiveness and physical transparency, enabling robust learning across multiple literature sources while supporting downstream interpretability analyses and physically grounded insight generation.
3.1. Formulation and Processing Descriptors
Formulation-related variables were treated as primary input features because they represent experimentally controllable parameters that indirectly govern porous architecture and downstream material properties across polyHIPE-based systems. The selected formulation descriptors include the internal phase type (conventional hydrophobic oil versus deep eutectic solvent, DES) and surfactant content, both of which are consistently reported across the curated literature and are well established as key determinants of emulsion stability and emulsion-templated pore structure [
4,
5,
6,
13].
The internal phase type was encoded as a categorical (binary) descriptor to capture fundamental differences in emulsion chemistry, interfacial behavior, and templating mechanisms between conventional hydrophobic oils (e.g., tetradecane) and DES-based internal phases. Surfactant content was treated as a continuous variable, reflecting its direct role in controlling droplet stabilization, pore size distributions, and pore–window formation during high internal phase emulsion (HIPE) templating [
4,
5,
6].
Polymerization conditions were reported as fixed within individual studies and were therefore not included as explicit model features. Instead, their constancy defines the experimental domain over which the learned structure–property–performance relationships are valid. This treatment avoids the introduction of redundant or weakly informative variables while preserving the physical and experimental context of the reported data.
3.2. Morphological (Structural) Descriptors
Quantitative descriptors of porous morphology were extracted from scanning electron microscopy (SEM) and associated image analysis as reported across the curated literature, including the reference study by Vallejo-Macías et al. [
13]. These variables correspond to the structural level of the data-flow framework illustrated in
Figure 1 and include the mean pore size, mean pore-window size, and degree of openness.
The degree of openness was calculated following established geometric formulations widely adopted in the polyHIPE literature, which relate pore and window size statistics to network connectivity and accessible porosity [
6,
7]. Together, these morphological descriptors encode transport-relevant characteristics of the macroporous network and have been shown to exert a strong influence on mass transfer, accessibility of active sites, and adsorption behavior in polyHIPE-based materials [
5,
8,
9].
To mitigate multicollinearity and preserve interpretability in the small-data regime, only representative summary statistics (mean values) were retained, rather than full pore-size distributions or higher-order moments. This choice balances structural fidelity with statistical robustness, ensuring that the selected descriptors capture the dominant morphological trends while remaining consistently defined and comparable across independent studies.
3.3. Physicochemical Property Descriptors
Physicochemical descriptors represent emergent material properties arising from the combined effects of formulation parameters and porous morphology. In the present analysis, the primary physicochemical descriptors include the Brunauer–Emmett–Teller (BET) specific surface area and γ-Fe
2O
3 nanoparticle loading, as consistently reported across the curated literature, including the reference study by Vallejo-Macías et al. [
13].
BET specific surface area provides a quantitative measure of the accessible interfacial area relevant to adsorption processes, integrating contributions from both macroporous architecture and surface-level features. In parallel, γ-Fe2O3 loading reflects the degree of magnetic functionalization and contributes to both adsorption behavior—through potential surface interactions—and magnetic response relevant to material recovery and separability.
Within the modeling framework, these variables were treated as intermediate-level descriptors, positioned between porous morphology and functional performance rather than as independently tunable control variables. This treatment allows the machine-learning models to learn associative mappings linking upstream formulation and structural attributes to downstream physicochemical properties and, in turn, functional performance, without imposing assumptions of direct or isolated causality.
In the present analysis, porous morphology was represented using mean values of pore size, pore-window size, and degree of openness, rather than full pore-size distributions or distribution-width metrics. This choice reflects a practical constraint of cross-study data integration, as measures of polydispersity (e.g., standard deviation, coefficient of variation, or distribution shape) were not consistently reported across the source studies.
Mean morphological descriptors were therefore selected as robust and commonly available summary statistics that enable comparability across heterogeneous literature sources. While this representation necessarily compresses structural variability, it captures first-order differences in porous architecture that are known to strongly influence mass transport and accessibility in polyHIPE systems [
5,
7,
9].
3.4. Feature Selection and Scaling Strategy
To mitigate overfitting in the small-data and cross-study regime, the feature set was deliberately restricted to descriptors with clear physical interpretation and demonstrated relevance to adsorption and magnetic performance in polyHIPE-based systems. Variables that directly encode, or trivially correlate with, functional outcomes—such as fitted isotherm parameters or performance-derived quantities—were explicitly excluded from the predictor set to prevent information leakage and artificial inflation of model accuracy.
All continuous features were standardized prior to model training to ensure comparable numerical scales and to avoid bias in algorithms sensitive to feature magnitude. Categorical formulation variables were encoded using simple binary representations to capture qualitative differences in material chemistry without introducing artificial ordinal relationships. No automated feature selection or dimensionality reduction techniques were applied prior to model fitting, as such approaches can obscure physical meaning and introduce instability in small datasets.
Instead, feature relevance was evaluated a posteriori using interpretable machine-learning techniques, including permutation importance, SHAP value analysis, and partial dependence diagnostics [
22,
23,
24]. This strategy preserves transparency, enables direct comparison between data-driven importance rankings and established physical understanding of polyHIPE systems, and supports robust interpretation of structure–property–performance associations across multiple independent studies.
Table 2 summarizes the formulation, morphological, physicochemical, and performance descriptors selected for machine-learning modeling, together with their data types, physical units, and corresponding literature sources.
4. ML Models and Validation
4.1. Model Selection for the Small-Data Regime
The machine-learning (ML) models employed in this study were selected with explicit consideration of the small-data and cross-study conditions characteristic of advanced polyHIPE-based composite research. Under such constraints, robustness, resistance to overfitting, and interpretability are prioritized over extreme model flexibility or representational depth, as overly complex models are prone to instability and poor generalization when trained on heterogeneous, literature-derived datasets [
16,
17,
24]. Accordingly, three complementary classes of regression models were evaluated.
As linear baselines, regularized linear regression models—Ridge and Least Absolute Shrinkage and Selection Operator (LASSO)—were employed to establish reference performance and to assess whether linear relationships between descriptors and target variables are sufficient to describe the observed trends across studies. These models impose strong bias control through ℓ2 and ℓ1 regularization, respectively, and provide conservative benchmarks against which the added value of nonlinear modeling can be quantitatively assessed.
To capture nonlinear interactions among formulation, morphological, and physicochemical descriptors, tree-based ensemble methods—specifically Random Forest (RF) and Gradient Boosting (GB) regressors—were selected as the primary nonlinear models. These algorithms are particularly well suited to small experimental datasets because they combine nonlinear function approximation with inherent regularization mechanisms, including bootstrap aggregation, feature subsampling, and shrinkage. As a result, they exhibit reduced sensitivity to noise, multicollinearity, and cross-study variability, while retaining strong predictive performance [
19,
20,
21]. In addition, tree-based ensembles naturally accommodate mixed descriptor types without requiring extensive preprocessing or variable transformation.
Gaussian Process Regression (GPR) was implemented using a Radial Basis Function (RBF) kernel augmented with a white-noise term. The RBF kernel encodes the assumption that the target response varies smoothly with respect to the input descriptors—an assumption that is physically consistent with polyHIPE-based composite systems, in which incremental changes in formulation, porous morphology, or physicochemical properties lead to continuous, rather than abrupt, variations in adsorption performance and magnetic response [
24].
From a physical perspective, the RBF kernel reflects the expectation that materials with similar pore openness, specific surface area, and γ-Fe2O3 loading exhibit similar functional behavior, with correlations decaying smoothly as descriptor values diverge. This assumption is consistent with transport- and accessibility-governed adsorption mechanisms in macroporous networks, where performance gradients are typically continuous within a constrained synthesis and processing domain.
4.2. Validation Strategy
To mitigate overfitting and ensure a reliable assessment of model performance under small-data and cross-study conditions, model evaluation was conducted using repeated K-fold cross-validation rather than a single train–test split. In this procedure, K-fold cross-validation was repeated multiple times using different random partitions in order to reduce variance associated with data splitting and to obtain statistically more stable performance estimates. This approach is particularly important for small and heterogeneous datasets, where individual samples may otherwise exert disproportionate influence on training and evaluation outcomes [
16,
17,
24].
All model hyperparameters were optimized exclusively within the cross-validation loops to prevent information leakage and optimistic bias. Predictive performance was quantified using the mean absolute error (MAE) and root mean squared error (RMSE), which provide direct interpretability in the physical units of the target variable. The coefficient of determination (R2) was also reported as a complementary metric but interpreted with caution, given its known sensitivity to sample size, variance structure, and target distribution in small datasets.
Beyond cross-validation, additional safeguards against overfitting were implemented through deliberate feature parsimony, explicit exclusion of target-derived or performance-embedded predictors, and systematic comparison against regularized linear baselines. Nonlinear models that did not consistently outperform linear regression were not considered to provide meaningful additional insight into structure–property–performance associations. This conservative validation strategy ensures that reported performance gains reflect genuine nonlinear learning rather than artifacts arising from model flexibility or limited data availability.
To explicitly assess potential study-level information leakage, a group-aware validation strategy was implemented in parallel. Each experimental source study was treated as a distinct group, and model evaluation was repeated using a Leave-One-Study-Out (LOSO) cross-validation scheme. Under LOSO validation, all samples from one study were excluded from training and used exclusively for testing, ensuring that study-specific patterns related to synthesis, characterization, or testing protocols could not be implicitly memorized by the models.
LOSO validation provides a stringent and application-relevant assessment of generalization in cross-study machine-learning analyses, where subtle protocol-dependent similarities may otherwise inflate apparent predictive performance. Model performance under LOSO validation was directly compared with that obtained from repeated K-fold cross-validation to quantify the magnitude of any optimism bias arising from within-study sample overlap.
To further evaluate the possibility that high apparent predictive performance could arise from chance correlations, a target permutation test was conducted. In this test, the response variable (removal efficiency) was randomly permuted while preserving the full modeling pipeline, including feature sets, hyperparameter tuning, and validation strategy. Performance obtained under permuted targets defines a null distribution corresponding to chance-level learning, against which observed model performance can be objectively benchmarked.
Finally, model stability was assessed by explicitly quantifying the distribution of predictive performance across repeated cross-validation runs. For each model, R2 was reported as mean ± standard deviation across all repetitions, rather than as a single point estimate. The Random Forest model achieved a mean cross-validated R2 of 0.93 ± 0.03, indicating stable performance across different data partitions and limited sensitivity to individual samples. In contrast, regularized linear baselines exhibited larger variability in R2 across folds, consistent with their reduced capacity to capture the nonlinear interactions present in the data.
4.3. Uncertainty Quantification
Prediction uncertainty was explicitly incorporated into the modeling framework to account for data scarcity and cross-study heterogeneity inherent to literature-derived datasets. For Gaussian Process Regression (GPR) models, predictive mean values and associated variances were obtained directly from the posterior distribution, providing a principled, fully probabilistic estimate of uncertainty grounded in the Bayesian formulation of the model.
For tree-based ensemble models, prediction uncertainty was approximated using distributional statistics derived from ensemble outputs, reflecting variability across individual trees and bootstrap-resampled training sets. Although these uncertainty estimates are not strictly probabilistic, they provide a practical and informative measure of model dispersion and sensitivity to data perturbations in small experimental datasets, where formal probabilistic modeling may not always be feasible.
In the GPR framework, an explicit white-noise kernel component was included to account for experimental uncertainty arising from heterogeneous measurement protocols, digitization of graphical data, and inter-study variability. This noise term prevents overconfident predictions in sparsely populated regions of the descriptor space and allows uncertainty estimates to capture both intrinsic data noise and model uncertainty. Such treatment is particularly important for small-data, cross-study analyses, where unmodeled variability can otherwise lead to misleadingly narrow confidence bounds. The resulting uncertainty behavior is summarized in
Table 3.
Sensitivity analysis using alternative kernel functions—specifically Matérn kernels with ν = 3/2 and ν = 5/2, as well as a linear kernel—demonstrated that both predictive performance and uncertainty trends were qualitatively stable across smooth kernel choices. Although minor quantitative differences in error metrics and uncertainty magnitude were observed, no smooth kernel selection led to a reordering of influential descriptors or to changes in the principal interpretability conclusions derived from the Gaussian Process Regression (GPR) analysis.
In contrast, the linear kernel yielded substantially inferior predictive performance, indicating that smooth but nonlinear functional relationships are required to adequately represent the underlying structure–property–performance mapping in macroporous polyHIPE-based composites. This result is consistent with the well-established multivariate and interacting effects governing adsorption behavior and magnetic functionality in porous polymer networks. Collectively, these observations support the robustness of the GPR findings and confirm that the inferred relationships are not artifacts of a particular kernel choice.
Measurement uncertainty arising from both experimental variability and literature data digitization was explicitly considered in the interpretation of model outputs. Although uncertainty estimates were not uniformly reported across all source studies, digitization-related uncertainty was incorporated at the descriptor level by treating extracted values as noisy observations rather than as exact numerical quantities. This approach reflects the epistemic uncertainty intrinsic to literature-derived datasets and mitigates the risk of overconfidence in reported precision.
For probabilistic models such as GPR, this uncertainty is directly propagated into the predictive variance through the posterior distribution. For ensemble-based models, uncertainty is reflected through the dispersion of predictions across individual trees and repeated cross-validation folds. This strategy enables uncertainty associated with heterogeneous measurement protocols and digitized data to be incorporated into prediction confidence without introducing unverifiable assumptions beyond what is supported by the published literature.
Reporting uncertainty alongside point predictions is particularly critical in small-data and cross-study contexts, where apparently strong predictive accuracy may otherwise obscure model fragility or localized overconfidence. Uncertainty-aware predictions facilitate conservative interpretation of results and enable identification of regions within the descriptor space where predictions are less reliable due to sparse data coverage or limited experimental representation. In this sense, machine learning is employed as an analytical tool for insight generation and hypothesis support, rather than as a black-box predictor for unvalidated extrapolation.
4.4. Interpretability and Model Diagnostics
Rather than relying exclusively on predictive accuracy, model interpretability was treated as a central objective of the present study. Feature relevance was evaluated a posteriori using permutation importance and SHAP (Shapley Additive Explanations), which quantify the contribution of individual descriptors to model predictions while explicitly accounting for nonlinear interactions and feature dependencies [
22,
23,
24]. These approaches provide complementary global and local perspectives on model behavior and are particularly well suited to small-data settings, where transparency, robustness, and physical plausibility are essential.
To further interrogate model responses, partial dependence plots (PDPs) were employed to visualize the marginal effects of key descriptors on the target variables. PDPs enable identification of nonlinear trends, saturation behavior, and potential trade-offs that are not readily captured by aggregate importance metrics alone. When interpreted in conjunction with permutation importance and SHAP analyses, PDPs facilitate systematic comparison between data-driven relevance rankings and established physical understanding of polyHIPE-based systems.
Importantly, all interpretability analyses are strictly constrained to the experimental domain defined by the curated literature, including the reference dataset reported by Vallejo-Macías et al. [
13]. No extrapolative interpretations are made beyond the range of observed compositions, porous structures, or test conditions represented in the underlying data. This conservative framing ensures that the identified associations reflect physically plausible, experimentally supported relationships, rather than artifacts arising from model flexibility, data sparsity, or uncontrolled heterogeneity.
5. Interpretability Analysis
Interpretability was treated as a central objective of the present study rather than as a secondary diagnostic, with the explicit aim of ensuring that machine-learning (ML) results yield physically meaningful insight into structure–property–performance relationships in macroporous polyHIPE-based magnetic polymer composites. This emphasis is particularly critical in small-data and cross-study regimes, where apparently high predictive accuracy may arise from spurious correlations and where purely black-box models offer limited scientific value or actionable design guidance [
16,
17,
18].
Accordingly, the interpretability analysis is directed toward identifying robust and transferable drivers of functional performance that persist across independently reported, yet chemically and morphologically compatible, material systems. By integrating complementary global and local interpretability tools, the analysis evaluates whether ML-derived trends are consistent with established physical understanding of polyHIPE architectures and adsorption mechanisms, while simultaneously revealing nonlinear effects and multifunctional trade-offs that are difficult to extract using conventional univariate or parametric approaches.
5.1. Model Performance and Sensitivity
To assess the potential risk of information leakage associated with intermediate-level descriptors, a sensitivity analysis was conducted by retraining the Random Forest models using progressively restricted feature sets. Specifically, models were trained using (i) the full descriptor set, (ii) a reduced set excluding BET specific surface area, and (iii) a further restricted set excluding both BET specific surface area and γ-Fe2O3 nanoparticle loading.
The comparative results indicate that, although absolute predictive performance decreases as intermediate descriptors are removed, the models retain statistically meaningful predictive capability and preserve the relative importance of key morphological descriptors, particularly degree of openness and pore connectivity. This behavior suggests that BET surface area and nanoparticle loading contribute complementary information that enhances predictive resolution, rather than acting as trivial or implicit proxies for the target variable. The consistency of descriptor rankings across restricted feature sets supports the robustness of the inferred structure–property–performance relationships and mitigates concerns regarding information leakage, as summarized in
Table 4.
5.2. Feature Importance and Global Interpretability
Global feature relevance was initially assessed using permutation importance, which quantifies the decrease in model performance when the values of a given descriptor are randomly permuted while preserving the joint distribution of the remaining variables [
22,
23]. This model-agnostic approach provides a robust estimate of feature influence and is particularly well suited for cross-study datasets, where assumptions regarding functional form, linearity, or descriptor independence may not hold. Permutation importance was therefore employed to rank formulation, morphological, and physicochemical descriptors according to their contribution to predictive accuracy for the target variables.
To complement the permutation-based analysis and to explicitly capture nonlinear interactions and context-dependent effects, SHAP (Shapley Additive Explanations) values were computed for the tree-based ensemble models [
22]. SHAP provides a unified, game-theoretic framework for decomposing individual predictions into additive contributions from each descriptor, enabling consistent comparison of feature relevance across samples and across studies. In the present work, SHAP summary plots were used to identify the dominant drivers of removal efficiency and magnetic saturation, while simultaneously revealing the directionality, magnitude, and variability of their effects across the assembled dataset.
Table 5 compares the predictive performance of linear and nonlinear machine-learning models for dye removal efficiency, highlighting the superior accuracy of tree-based ensemble methods in the small-data regime. The reported metrics—mean absolute error (MAE), root mean squared error (RMSE), and the mean cross-validated coefficient of determination (R
2)—were obtained using repeated K-fold cross-validation to ensure statistically robust performance estimation and to mitigate overfitting under small-data, cross-study conditions.
As hypothesized, models capable of capturing nonlinear relationships—namely Random Forest (RF) and Gradient Boosting (GB)—substantially outperformed the linear baseline (Elastic Net). The linear model achieved a cross-validated coefficient of determination (R2) of 0.65, indicating that approximately 35% of the variance in removal efficiency cannot be explained by simple additive effects. In contrast, the Random Forest regressor exhibited the highest predictive accuracy, with a mean cross-validated R2 of 0.93 and a low mean absolute error (MAE) of 2.4%. This pronounced performance gap indicates that nonlinear and coupled associations among formulation parameters, porous morphology, and physicochemical properties dominate the observed variation in functional performance for macroporous polyHIPE-based magnetic composites.
To assess whether the prominence of BET specific surface area and degree of openness in the SHAP analysis could arise from descriptor collinearity rather than genuine structural relevance, pairwise correlation analysis and variance inflation factor (VIF) diagnostics were performed (
Figure S3;
Tables S4 and S5). A strong positive correlation was observed between BET specific surface area and degree of openness (Pearson’s r ≈ 0.78), consistent with their shared physical origin in pore connectivity and accessible internal surface area.
Importantly, VIF values for all descriptors remained below commonly accepted thresholds for severe multicollinearity (VIF < 5), indicating that although structural descriptors are partially correlated, they do not introduce numerical instability or artificially inflate feature dominance within the models. Accordingly, SHAP-derived importance is interpreted at the level of descriptor groups representing coupled structural accessibility effects, rather than as evidence of isolated, independent drivers.
Under this interpretation, the dominance of BET specific surface area and degree of openness reflects a combined structural accessibility driver set governing adsorption performance, rather than a single controlling variable. This conclusion is physically consistent with the intrinsic coupling characteristic of polyHIPE systems, in which formulation parameters jointly shape pore connectivity, internal surface area, and transport pathways.
Based on its superior predictive stability and interpretability, the Random Forest model was selected as the primary analytical framework for subsequent interpretability analyses (
Section 5.3 and
Section 5.4). Although Gaussian Process Regression (GPR) exhibited slightly lower predictive performance (R
2 = 0.85), it provided complementary value through explicit uncertainty quantification, which is particularly important for drawing robust conclusions from small, heterogeneous, literature-derived datasets.
5.3. Local Interpretability and Partial Dependence Analysis
Beyond global importance rankings, partial dependence plots (PDPs) were employed to visualize the marginal association between selected descriptors and model predictions while averaging over the influence of all remaining variables [
23]. PDPs were used to examine nonlinear trends, threshold behavior, and saturation effects associated with key morphological descriptors (e.g., degree of openness) and physicochemical properties (e.g., BET specific surface area and γ-Fe
2O
3 loading). These visualizations facilitate qualitative interpretation of how incremental changes in individual features are associated with predicted performance within the experimentally observed domain.
Local interpretability was further investigated using sample-specific SHAP analyses, which decompose individual predictions into additive contributions from each descriptor. This local perspective reveals how specific combinations of formulation, structural, and physicochemical features are jointly associated with high or low predicted removal efficiency. Such analyses are particularly valuable for identifying representative material configurations within the dataset and for linking machine-learning predictions back to experimentally realizable formulations, thereby strengthening the connection between data-driven insight and practical composite design.
To assess whether the saturation behavior observed in the partial dependence plots reflects robust structure–performance associations rather than modeling artifacts arising from sparse data regions, individual conditional expectation (ICE) plots were generated for the same key descriptors. Unlike PDPs, which display averaged marginal effects, ICE plots visualize sample-specific response trajectories and therefore enable evaluation of heterogeneity and stability across the dataset.
The ICE analysis indicates that the apparent saturation trends for descriptors such as degree of openness and BET specific surface area are consistently reproduced across individual samples. Most ICE curves exhibit monotonic increases followed by plateau-like behavior, suggesting convergence of predicted performance at higher descriptor values. Importantly, these plateaus occur within regions of relatively high sample density rather than at the boundaries of the descriptor space. To further contextualize these findings, data-density indicators were overlaid on the corresponding PDPs, confirming that the saturation regions correspond to well-sampled portions of the experimental domain.
Taken together, the combined PDP–ICE analysis supports the interpretation that the observed saturation behavior reflects stable associative trends present in the compiled experimental data, rather than numerical artifacts induced by extrapolation or insufficient data coverage. Nevertheless, all interpretations remain strictly constrained to the experimental space defined by the curated literature, and the observed plateaus are discussed as data-supported tendencies rather than universal physical limits.
5.4. Partial Dependence, Individual Conditional Expectation, and Data Density Analysis
To evaluate the robustness of the partial dependence trends reported in the main text, individual conditional expectation (ICE) plots were generated for all descriptors exhibiting nonlinear or saturating behavior in the PDP analysis. ICE curves were computed using the same trained Random Forest model employed for the primary interpretability assessment, thereby ensuring methodological consistency and avoiding confounding effects arising from model variation.
In parallel, kernel density estimates of the feature distributions were calculated and overlaid on the corresponding PDPs to explicitly delineate regions of high and low data density. This combined visualization framework enables clear discrimination between apparent saturation arising from averaging effects in sparsely sampled regions and saturation supported by multiple experimental observations, as illustrated in
Figure 2.
Across all examined descriptors, regions where PDPs exhibit plateau-like behavior coincide with dense clusters of ICE trajectories and elevated feature density. This convergence indicates that the observed saturation effects are not driven by isolated samples or extrapolative model behavior, but instead reflect stable associative patterns present in the compiled experimental data. Collectively, these results reinforce the robustness of the interpretability conclusions under the small-data and cross-study conditions characterizing the present analysis.
5.5. Physical Plausibility and Causality Considerations
All interpretability results were evaluated in the context of established physical understanding of polyHIPE-based systems and strictly within the experimental boundaries defined by the curated literature, including the reference study by Vallejo-Macías et al. [
13]. Importantly, relationships identified through machine-learning analysis are interpreted as associative patterns observed within the available data, rather than as evidence of strong, isolated, or universal physical causation. This distinction is essential for porous polymer matrix composites, in which formulation parameters, porous morphology, and physicochemical properties are intrinsically coupled through shared synthesis and functionalization pathways.
To avoid over-interpretation, conclusions derived from feature-importance rankings, SHAP analyses, and partial dependence plots were restricted to trends that are physically plausible and consistent with established transport, accessibility, and interfacial interaction mechanisms in macroporous networks. In particular, the identified roles of accessible surface area, pore connectivity, and structural openness in governing adsorption behavior are in direct agreement with prior experimental and theoretical studies of polyHIPE systems [
5,
6,
7]. When ML-derived trends deviated from simple monotonic expectations, such behavior was interpreted as indicative of nonlinear interactions or competing effects among descriptors, rather than as contradictions of established physical theory.
Accordingly, all interpretability outcomes are framed as data-supported hypotheses suitable for guiding targeted experimental validation, rather than as definitive mechanistic claims. This conservative interpretative strategy ensures that machine learning functions as a tool for insight generation and hypothesis refinement, while remaining firmly grounded in experimentally supported physical principles and the defined compatibility domain of the source literature.
5.6. Role of Interpretability in Data-Driven Design
The interpretability framework adopted in this study enables the translation of machine-learning outputs into actionable design insight for macroporous polyHIPE-based magnetic polymer composites. By identifying structural and compositional descriptors that are most strongly associated with removal efficiency and magnetic functionality, the analysis supports rational screening of formulation strategies within the experimentally explored design space, rather than unguided trial-and-error optimization.
Crucially, interpretability allows the learned associations to be evaluated in the context of physical plausibility, facilitating discrimination between robust, transferable trends and dataset-specific artifacts arising from limited sample size or cross-study heterogeneity. In this way, interpretable machine learning functions as a decision-support tool that guides hypothesis generation and prioritization of experimental variables for further investigation, rather than as a substitute for mechanistic modeling or targeted experimental validation.
This perspective is consistent with emerging practices in data-driven materials science, particularly in small-data regimes, where interpretable machine-learning approaches are increasingly recognized as effective mechanisms for bridging limited experimental datasets and informed materials design while maintaining transparency, robustness, and physical grounding [
16,
17,
18,
24].
6. Results and Discussion
6.1. Model Performance Under Cross-Validation
The predictive performance of the machine-learning (ML) models was evaluated using repeated K-fold cross-validation to ensure a robust assessment under the small-data conditions characteristic of polyHIPE-based composite fabrication. Model performance was quantified using the mean absolute error (MAE) and root mean squared error (RMSE), which provide direct interpretability in the physical units of the target variable (removal efficiency, %). The coefficient of determination (R2) was additionally reported as a complementary metric but interpreted with caution, given its sensitivity to variance structure and sample size in small datasets.
Regularized linear models (Ridge and LASSO) were employed as conservative baselines to capture broad linear associations between descriptors and removal efficiency. However, their predictive accuracy was consistently inferior to that of nonlinear models, indicating that linear relationships alone are insufficient to represent the coupled effects of formulation parameters, porous morphology, and physicochemical properties on adsorption performance. This observation is consistent with prior experimental and theoretical studies showing that polyHIPE functionality emerges from interacting structural accessibility and transport phenomena rather than from monotonic dependence on individual variables [
7,
8,
9,
13].
Tree-based ensemble models, including Random Forest (RF) and Gradient Boosting (GB), consistently outperformed the linear baselines across cross-validation folds. These models achieved lower MAE and RMSE values, reflecting their ability to capture nonlinear interactions and conditional dependencies among formulation, morphological, and physicochemical descriptors. Importantly, these performance gains were obtained without excessive variance across folds, suggesting that ensemble-based regularization mechanisms—such as bootstrap aggregation and feature subsampling—effectively mitigated overfitting despite the limited dataset size.
Gaussian Process Regression (GPR) exhibited predictive accuracy comparable to that of the tree-based ensembles while additionally providing explicit estimates of prediction uncertainty. Although GPR performance was sensitive to kernel selection and hyperparameter tuning, its probabilistic formulation proved valuable for identifying regions of the descriptor space where predictions are less reliable due to sparse data coverage. This capability complements deterministic ensemble models and supports more conservative and uncertainty-aware interpretation of ML-derived trends in the small-data, cross-study context.
6.2. Cross-Study Generalization Behavior
Cross-study generalization was examined to evaluate whether the machine-learning models capture transferable structure–property–performance relationships rather than correlations specific to individual studies. Given the inherent heterogeneity of literature-derived datasets—arising from variations in synthesis protocols, characterization methodologies, and reporting practices—robust generalization across independent studies constitutes a stringent and practically relevant test of model validity.
Models trained on pooled multi-study data maintained strong predictive performance under repeated K-fold cross-validation, indicating that the dominant nonlinear relationships learned by the ensemble methods are not confined to a single experimental source. In particular, the Random Forest and Gradient Boosting models exhibited stable error metrics across folds containing samples from different studies, suggesting that the learned mappings between formulation variables, porous morphology, and physicochemical descriptors generalize across chemically and structurally compatible polyHIPE-based systems.
Under Leave-One-Study-Out (LOSO) validation, predictive performance decreased relative to standard cross-validation, as expected for this more conservative assessment of generalization. Nevertheless, the Random Forest model retained substantial predictive capability, indicating that the learned relationships are not dominated by study-specific artifacts but instead reflect transferable associative patterns across independently reported datasets.
The controlled reduction in performance observed under LOSO validation delineates the realistic bounds of cross-study generalization in small, heterogeneous materials datasets, while simultaneously supporting the robustness and practical relevance of the proposed modeling framework, as summarized in
Table 6.
Nonetheless, a modest increase in prediction uncertainty was observed for samples originating from studies occupying sparsely populated regions of the descriptor space, such as formulations exhibiting extreme pore openness or atypical γ-Fe
2O
3 loadings. This behavior is fully consistent with the uncertainty analyses discussed in
Section 4.3 and reflects limited data representation rather than deficiencies in model formulation or training. Importantly, the incorporation of uncertainty-aware predictions enabled explicit identification of these regimes, reinforcing the value of probabilistic diagnostics for responsible interpretation in cross-study machine-learning analyses.
Overall, these results demonstrate that interpretable machine-learning models—when developed using conservative feature selection, rigorous validation strategies, and explicit uncertainty quantification—can extract robust and transferable structure–property–performance trends from small and heterogeneous experimental datasets. At the same time, the observed limitations underscore the importance of continued expansion of curated datasets and careful harmonization of descriptors to further enhance cross-study generalization and confidence in future data-driven investigations of multifunctional polymer matrix composites.
6.3. Predicted Versus Experimental Removal Efficiency
Parity plots comparing predicted and experimentally measured removal efficiency values show good overall agreement for the nonlinear models, particularly Random Forest (RF), Gradient Boosting (GB), and Gaussian Process Regression (GPR). For these models, the majority of predictions cluster closely around the identity line, indicating the absence of systematic bias toward over- or underestimation across the range of removal efficiencies reported in the curated literature, including the reference dataset of Vallejo-Macías et al. [
13]. Deviations from perfect agreement are more pronounced at the extremes of the performance range, where experimental sample density is limited, underscoring the influence of data availability on predictive reliability.
In contrast, the linear baseline models exhibit substantially greater dispersion around the parity line, especially for samples associated with high removal efficiency. This behavior suggests that high-performance regimes are associated with nonlinear combinations of porous openness, accessible surface area, and magnetic nanoparticle loading, which cannot be adequately captured by purely additive linear relationships. These observations are consistent with the comparative performance metrics summarized in
Table 2 and further support the necessity of nonlinear modeling approaches for polyHIPE-based composite systems.
The parity plot presented in
Figure 3 illustrates the strong agreement between experimental removal efficiency values and predictions generated by the optimized Random Forest model. Data points cluster tightly around the line of identity (y = x), visually corroborating the high cross-validated coefficient of determination (R
2 = 0.93) reported in
Table 4. The low dispersion and absence of systematic bias across the full efficiency range—particularly in the high-performance regime (≈90–98%)—support the robustness of the RF model and its ability to generalize associative structure–property–performance relationships learned from a limited and heterogeneous experimental dataset.
To translate the high predictive accuracy of the Random Forest model into actionable design insight, the global contribution of each input descriptor to the prediction of removal efficiency was quantified using permutation importance. This approach evaluates feature relevance by randomly permuting the values of a given descriptor and measuring the resulting degradation in model performance, thereby providing a robust, model-agnostic estimate of its influence that explicitly accounts for nonlinear interactions among variables.
Figure 4 presents the resulting hierarchy of influential design factors. The analysis indicates that adsorption performance emerges primarily from porous structural characteristics rather than being directly governed by formulation-level variables. In particular, degree of openness and BET specific surface area dominate the importance ranking by a substantial margin. This outcome highlights that mass transport efficiency and accessibility of adsorption sites—enabled by highly interconnected pore networks and large accessible surface areas—constitute the principal limiting factors controlling overall removal efficiency within the examined material class.
By contrast, formulation-related descriptors such as internal phase type and surfactant content exhibit a comparatively lower global contribution once their downstream structural effects are implicitly accounted for by the model. Notably, γ-Fe2O3 nanoparticle loading ranks third in overall importance, indicating that while magnetic functionalization is critical for recoverability and multifunctionality, its direct associative contribution to adsorption efficiency is secondary to the morphological features governing transport and surface accessibility. This ranking reinforces the conceptual separation between structure-driven adsorption performance and composition-driven auxiliary functionality, a distinction that is central to the rational design of multifunctional macroporous polymer matrix composites.
To move beyond the global rankings provided by permutation importance, SHAP (Shapley Additive Explanations) summary analysis was employed to interrogate the Random Forest model at the sample level (
Figure 5). SHAP analysis provides detailed insight into model behavior by quantifying, for each individual observation, how the value of a given descriptor contributes to increasing (positive SHAP value) or decreasing (negative SHAP value) the predicted removal efficiency.
The SHAP summary plot corroborates the overall feature hierarchy identified in
Figure 3 while adding essential information on effect directionality and non-linear behavior:
Morphological drivers (degree of openness and BET specific surface area). Both descriptors exhibit strong and consistently positive contributions to removal efficiency. High values of degree of openness and BET surface area (red points) are predominantly associated with positive SHAP values, indicating that increased pore connectivity and accessible interfacial area systematically enhance adsorption performance. This result reinforces the conclusion that transport accessibility and available surface area constitute the primary design levers for maximizing removal efficiency.
Nanoparticle loading trade-off (γ-Fe2O3 loading). The SHAP distribution for γ-Fe2O3 loading reveals a pronounced non-linear effect. While many samples with higher nanoparticle loadings show positive SHAP values—consistent with an increased number of active sites—a distinct subset of high-loading samples contributes negatively to predicted performance. This pattern provides clear evidence of a performance trade-off, suggesting that excessive γ-Fe2O3 loading can hinder adsorption efficiency, potentially due to pore blockage, particle aggregation, or reduced accessibility of the polymeric surface.
Internal phase type. The categorical analysis of internal phase type reveals a systematic preference within the studied domain: one internal phase (encoded as the higher categorical value) is consistently associated with positive SHAP values, indicating superior performance relative to the alternative. This finding highlights the indirect yet reproducible influence of internal phase chemistry on porous architecture development and, consequently, on adsorption behavior.
Surfactant content and mean pore size. These descriptors are largely centered around zero SHAP value, confirming their lower global relevance. Their dispersed and near-neutral contributions indicate that, within the investigated range, their effects on removal efficiency are secondary and highly context-dependent rather than dominant performance drivers.
Taken together, the SHAP analysis extends beyond simple correlation by revealing descriptor-specific regimes and trade-offs that are directly relevant to engineering design. In particular, it identifies an effective upper bound for γ-Fe2O3 loading and further underscores the primacy of structural openness and surface accessibility as key optimization targets for macroporous polyHIPE-based magnetic polymer composites.
To explicitly examine multifunctional trade-offs in the studied polyHIPE-based magnetic polymer composites, the relationship between adsorption performance and magnetic recoverability was analyzed by jointly considering removal efficiency (%) and magnetic saturation (Ms).
This representation provides a physically intuitive complement to the machine-learning–based interpretability analysis by making explicit the competing functional objectives that govern practical material selection. Rather than identifying a single “optimal” formulation, the Pareto analysis delineates a set of non-dominated solutions, thereby enabling informed design choices depending on whether adsorption capacity or magnetic recoverability is prioritized in a given application context.
The Pareto front shown in
Figure 6 was constructed directly from the experimentally reported values of removal efficiency and magnetic saturation as competing objectives. A non-dominated sorting procedure was applied, whereby a sample was classified as Pareto-optimal if no other sample simultaneously exhibited both higher removal efficiency and higher magnetic saturation. This approach extracts trade-off relationships directly from the experimental design space without imposing any assumed functional form or optimization model.
To account for uncertainty arising from experimental variability and model prediction error, an uncertainty envelope was superimposed on the Pareto front. For removal efficiency, uncertainty bounds were estimated from the cross-validated prediction dispersion of the Random Forest model, while for magnetic saturation, experimentally reported variability was used where available. The resulting envelope provides a qualitative indication of the robustness of the Pareto-optimal region rather than a formal probabilistic confidence interval.
Importantly, the Pareto front is interpreted as an associative representation of multifunctional trade-offs within the experimentally sampled domain. It is not intended as a prescriptive optimum or an extrapolative design target beyond the bounds of the available data.
6.4. Robustness and Overfitting Considerations
Multiple complementary indicators support the robustness of the predictive models developed in this study. First, the consistent ranking of model performance across repeated cross-validation runs indicates that the reported results are not artifacts of favorable data partitioning. Second, the absence of disproportionate performance gains from increasingly complex models suggests that the observed improvements arise from physically meaningful nonlinear structure–property–performance relationships rather than from excessive model flexibility or memorization of the training data. Third, systematic comparison against regularized linear baselines provides a conservative reference, confirming that nonlinear modeling is warranted while remaining well controlled.
Model robustness is further evidenced by the narrow dispersion of performance metrics across cross-validation repetitions. In particular, the low standard deviation of R2 values for the Random Forest and Gradient Boosting models indicates that the reported high predictive accuracy is not driven by isolated influential samples or fortuitous data splits. This stability contrasts with the behavior typically associated with optimism bias in small datasets, where performance metrics fluctuate strongly across folds. Taken together with baseline comparisons and uncertainty-aware prediction analysis, these results provide convergent evidence that the observed performance reflects genuine associative structure–property–performance relationships rather than overfitting artifacts.
The permutation test results further confirm that the high predictive performance of the Random Forest model does not arise from statistical chance. Models trained on permuted targets consistently yielded near-zero or negative R
2 values and substantially higher RMSE, indicating the absence of learnable structure once the descriptor–target relationship is destroyed. The observed R
2 value of 0.93 lies well outside the null distribution, providing strong evidence that the reported performance reflects meaningful associative structure rather than optimism bias. The distribution of cross-validated R
2 values obtained under random permutation of the response variable (removal efficiency) is reported in
Table S1 and
Figure S2 of the
Supplementary Materials.
Equally important, explicit consideration of prediction uncertainty constrains over-interpretation of model outputs. Uncertainty estimates obtained from Gaussian Process Regression and ensemble-based dispersion analyses reveal that model confidence varies across the descriptor space. Samples associated with uncommon combinations of porous morphology and γ-Fe2O3 loading exhibit higher predictive uncertainty, reflecting sparse data coverage rather than model failure. Rather than constituting a limitation, this behavior provides actionable guidance for future experimental design by identifying regions where additional data acquisition would most effectively improve model fidelity and generalization.
Finally, sensitivity analysis indicates that the strong performance of the Random Forest model is not solely driven by intermediate variables closely related to adsorption performance. Although inclusion of BET specific surface area and γ-Fe2O3 loading enhances predictive accuracy, their removal does not collapse model performance nor alter the qualitative structure–performance trends identified through interpretability analysis. This finding suggests that the model captures distributed, multivariate associations rather than relying on information leakage through derived or post hoc descriptors. Accordingly, intermediate variables are interpreted as physically meaningful mediators that link formulation and porous morphology to functional performance, rather than as shortcuts that trivially encode the target response.
6.5. Implications for Data-Driven Materials Modeling
The observed predictive performance demonstrates that interpretable machine-learning models can reliably capture structure–performance relationships in macroporous polyHIPE-based composites using existing experimental datasets, without requiring additional synthesis or testing. The consistent superiority of nonlinear ensemble and probabilistic models over linear baselines confirms that adsorption performance is governed by interacting effects across formulation variables, porous morphology, and physicochemical properties, in direct agreement with the experimentally observed complexity reported by Vallejo-Macías et al. [
13].
More broadly, these results underscore the practical value of machine learning as an analytical extension of experimental composite research rather than a substitute for it. When applied within a rigorously validated and uncertainty-aware framework, machine learning enables efficient interrogation of high-dimensional design spaces that are otherwise explored only qualitatively or through limited parametric variation. At the same time, the findings emphasize the necessity of careful validation, explicit uncertainty quantification, and restraint in extrapolation when operating in the small-data regimes characteristic of advanced polymer matrix composites.
Within these constraints, data-driven modeling provides a powerful complement to experimental investigation by revealing latent structure–property–performance linkages, prioritizing influential design variables, and guiding subsequent interpretability-driven analyses. Accordingly, the present framework illustrates how existing composite datasets can be leveraged to extract generalizable insight and support rational materials design, even in the absence of high-throughput experimentation.
6.6. Interpretability-Driven Insights
The interpretability analyses provide mechanistic insight into how formulation parameters, porous morphology, and physicochemical properties jointly govern adsorption performance and magnetic functionality in macroporous polyHIPE-based composites. Rather than identifying a single dominant variable, the results consistently indicate that functional performance emerges from coupled structural and compositional effects, in agreement with the experimentally observed behavior reported by Vallejo-Macías et al. [
13].
Across all nonlinear models, descriptors associated with accessible surface area and pore connectivity rank among the most influential contributors to removal efficiency. In particular, the degree of openness and BET specific surface area exhibit strong positive associations with predicted adsorption performance, underscoring the central role of transport accessibility and available adsorption sites within the macroporous network. Partial dependence analysis further reveals that these effects are intrinsically nonlinear, with diminishing returns beyond intermediate values of openness and surface area. This behavior suggests that excessive structural openness does not necessarily translate into proportional performance gains, likely reflecting trade-offs between pore accessibility, effective surface utilization, and fluid–solid contact efficiency.
The loading of γ-Fe2O3 nanoparticles emerges as a key descriptor influencing both adsorption efficiency and magnetic response. Interpretability results indicate that moderate nanoparticle loadings contribute positively to removal efficiency, potentially through increased surface heterogeneity and enhanced availability of adsorption sites. However, higher loadings are associated with plateauing or slightly reduced predicted removal efficiency, highlighting a multifunctional trade-off between magnetic recoverability and adsorption effectiveness. This outcome is physically plausible, as excessive nanoparticle incorporation may partially obstruct pore windows, promote aggregation, or reduce the effective polymeric surface available for dye adsorption.
Beyond the pore-blocking hypothesis, several microstructural mechanisms may plausibly contribute to the negative SHAP contributions observed at high γ-Fe2O3 loadings. First, excessive nanoparticle deposition can partially mask the polymeric surface, reducing the availability of functional groups responsible for dye–polymer interactions, even as the inorganic content increases. Second, high nanoparticle loading may promote aggregation or clustering within pore walls and windows, locally reducing effective pore connectivity and increasing transport resistance, particularly in regions where macropore–window constrictions dominate mass transfer. Third, dense inorganic domains can alter surface wettability and local electrostatic environments, potentially reducing affinity for cationic dye molecules despite increased nominal surface area. Finally, increased γ-Fe2O3 content may stiffen pore walls or reduce nanoscale surface roughness, indirectly affecting adsorption kinetics and accessibility. Importantly, these mechanisms are not mutually exclusive and are interpreted as associative hypotheses consistent with the observed SHAP patterns rather than definitive causal explanations. Their identification highlights the inherent multifunctional trade-off in magnetic polyHIPE composites, where maximizing magnetic response may impose structural and interfacial penalties that limit adsorption performance, thereby motivating balanced rather than extreme nanoparticle loading strategies.
Formulation variables, including internal phase type and surfactant content, do not appear as dominant predictors when considered in isolation. Instead, their influence is primarily indirect, operating through their impact on porous morphology and emergent physicochemical properties. This finding reinforces the hierarchical structure–property–performance framework illustrated in
Figure 1, in which formulation parameters shape pore architecture, which in turn governs accessible surface area and functional outcomes. The absence of strong direct formulation effects helps explain why one-factor-at-a-time or purely parametric analyses often fail to capture performance trends in polyHIPE systems.
Importantly, the identified trends are consistent across multiple model classes, lending confidence that the interpretability results reflect robust, data-supported associations rather than model-specific artifacts. At the same time, variability in feature contributions across individual samples highlights the context-dependent nature of performance drivers, emphasizing that optimal design strategies require coordinated tuning of multiple descriptors rather than maximization of any single variable.
Taken together, these interpretability-driven insights translate predictive modeling results into practical guidance for materials design within the experimental domain of the curated datasets. The analysis indicates that high adsorption performance in polyHIPE-based magnetic composites is favored by balanced combinations of pore openness, accessible surface area, and nanoparticle loading, rather than extreme values of individual parameters. By explicitly revealing these trade-offs, the interpretable machine-learning framework provides a transparent basis for rational formulation screening and for prioritizing future experimental exploration.
6.7. Limitations
Several limitations of the present study should be acknowledged in order to properly contextualize the reported results.
First, the analysis is based on a finite set of previously published experimental datasets derived from closely related polyHIPE-based magnetic composite systems [
13]. Although these datasets are well characterized and representative of an important class of macroporous polymer matrix composites, the learned relationships are inherently constrained to the formulation space, synthesis protocols, and performance ranges explored in the source studies. Consequently, model predictions and interpretability outcomes should not be extrapolated beyond this experimental domain without additional experimental validation.
Second, the available data fall within the small-data regime typical of advanced porous composite fabrication, where synthesis and characterization are experimentally intensive and low-throughput. While the modeling framework explicitly addresses this limitation through conservative model selection, repeated cross-validation, feature parsimony, and uncertainty-aware interpretation, limited sample sizes inevitably reduce statistical power and increase sensitivity to individual data points. Accordingly, the results emphasize robust qualitative trends and relative feature influence rather than precise numerical optimization or universal scaling laws.
Third, the machine-learning models employed in this work identify associative patterns between descriptors and performance metrics rather than establishing strong physical causality. Although interpretability tools such as permutation importance, SHAP analysis, and partial dependence plots provide valuable insight into dominant drivers and nonlinear interactions, these findings should be interpreted as data-supported hypotheses consistent with existing physical understanding, not as definitive mechanistic proof. Establishing causal relationships would require targeted experimental designs or physics-based modeling approaches beyond the scope of the present study.
Fourth, the selection of removal efficiency (%) at fixed operating conditions as the primary performance metric—while advantageous for practical comparability across studies—does not capture the full kinetic or thermodynamic complexity of adsorption processes. Parameters such as adsorption rates, equilibrium capacities over broader concentration ranges, or performance under varying pH and temperature conditions were intentionally excluded to maintain a focused and internally consistent modeling target. Future work could extend the present framework to incorporate multi-condition, time-resolved, or multi-objective performance metrics.
Fifth, heterogeneity in SEM image acquisition and analysis protocols across source studies constitutes an additional source of uncertainty. Differences in magnification, thresholding, and segmentation strategies inevitably affect reported morphological descriptors such as pore size and degree of openness. While these effects introduce protocol-dependent variability, they are representative of the challenges inherent to synthesizing literature data across independent experimental efforts. Importantly, the persistence of consistent feature importance rankings and interpretable trends across models suggests that the dominant structure–performance relationships identified in this work are robust to moderate methodological variability. Nevertheless, future cross-study analyses would benefit from standardized image analysis pipelines or access to raw micrographs to enable unified post-processing.
Sixth, an additional source of uncertainty arises from the use of digitized data extracted from published figures when numerical values were not explicitly reported. Although repeated digitization was employed to reduce operator-dependent variability, such values inevitably carry higher uncertainty than directly tabulated measurements. Rather than treating digitized data as exact inputs, the present framework interprets them as approximate representations within the experimental domain. The consistency of observed trends across models and validation strategies suggests that the principal structure–property–performance relationships identified are robust to this level of measurement noise; nonetheless, future studies would benefit from fully tabulated datasets to further improve model fidelity.
Seventh, the use of removal efficiency as the sole performance target necessarily abstracts away detailed kinetic and thermodynamic information associated with adsorption processes. Differences in adsorption rates, diffusion regimes, or equilibrium isotherm shapes may therefore be partially masked when performance is represented by a single efficiency value at fixed conditions. This abstraction is intentional and aligned with the objectives of the present study, which focuses on identifying robust, comparable structure–performance associations across heterogeneous literature data rather than resolving detailed adsorption mechanisms. Accordingly, conclusions are restricted to the defined operating conditions and should not be interpreted as comprehensive descriptors of adsorption behavior across broader concentration, time, or pH ranges.
Eighth, hydrodynamic and mixing-related effects were implicitly treated as unobserved variables. Differences in agitation intensity, particle size, and fluid–solid contact conditions across studies may contribute to variability in reported removal efficiencies that cannot be explicitly resolved within the current dataset. Rather than representing a methodological oversight, this limitation reflects the constraints inherent to literature-derived data synthesis. The consistency of model performance under cross-validation and the persistence of interpretable trends suggest that the dominant structure–performance associations identified are not driven solely by hydrodynamic artifacts. Nevertheless, future studies incorporating standardized hydrodynamic descriptors or controlled flow conditions would enable more complete separation of material-intrinsic and process-dependent effects.
Ninth, descriptors such as BET specific surface area and nanoparticle loading occupy an intermediate position within the structure–property–performance hierarchy. Although sensitivity analysis mitigates concerns regarding information leakage, future studies could further reduce reliance on such intermediate variables by incorporating more direct morphological descriptors or by explicitly modeling hierarchical dependencies.
A further limitation is the omission of pore-size distribution width or polydispersity metrics from the morphological descriptor set. Variability in pore size distributions can influence local transport pathways and adsorption heterogeneity, and its exclusion may affect the magnitude and dispersion of feature contributions observed in interpretability analyses such as SHAP. However, because distribution width was not consistently reported across studies, its inclusion would have substantially reduced the usable dataset and undermined cross-study comparability. Consequently, SHAP-based interpretations should be understood as reflecting average structural effects rather than local variability within individual pore networks. Future work incorporating standardized reporting of pore-size distributions or raw image data would enable more granular assessment of how morphological heterogeneity modulates structure–performance relationships.
Finally, several constraints arise when considering translation toward process-scale implementation. The synthesis of polyHIPE-based composites is inherently sensitive to emulsion stability, mixing protocols, and phase volume fractions, all of which may scale nonlinearly with batch size and reactor geometry. As a result, morphological descriptors identified as performance drivers—such as degree of openness or accessible surface area—may be more difficult to reproduce consistently under large-scale or continuous processing conditions. Moreover, the multifunctional trade-off identified between adsorption efficiency and magnetic saturation suggests that scale-up strategies prioritizing rapid magnetic separation (e.g., higher γ-Fe2O3 loading) may incur performance penalties in adsorption-driven applications. At the process level, this trade-off may manifest as increased pressure drop, reduced mass-transfer efficiency, or diminished effective surface accessibility, particularly in packed-bed or flow-through configurations.
Taken together, these considerations indicate that while the present framework can inform formulation and microstructural targets, successful process scaling will require coordinated optimization of material design and operating conditions. The interpretable machine-learning approach developed here is therefore best viewed as a screening and prioritization tool that can reduce experimental burden in early-stage design, rather than as a substitute for pilot-scale validation.
6.8. Applicability Domain with Respect to Dye Chemistry and pH Conditions
The generalizability of the present machine-learning analysis is explicitly constrained by the chemical and operational domain represented in the curated dataset. All adsorption performance data used for model training and interpretation correspond to methylene blue, a cationic dye, evaluated under fixed aqueous conditions and within a narrow pH range as reported in the source studies [
13]. Accordingly, the learned structure–property–performance associations should be interpreted as valid only for adsorption systems dominated by electrostatic interactions and accessibility effects characteristic of cationic organic dyes.
Extension of the present conclusions to anionic or zwitterionic dyes, or to systems in which specific chemical interactions—such as hydrogen bonding, chelation, or π–π stacking—dominate adsorption behavior, is not supported by the available data and therefore lies outside the current applicability domain. Likewise, pH-dependent effects—including surface charge reversal of the polymer matrix, changes in iron-oxide nanoparticle surface chemistry, or dye speciation—were not explicitly modeled and may substantially alter adsorption mechanisms beyond the conditions represented here.
For these reasons, the machine-learning–derived trends identified in this work should be regarded as conditional associations within a well-defined experimental domain, rather than as universal descriptors of adsorption behavior across dye classes or operating environments. Future studies incorporating multi-dye systems and controlled pH variation will be required to assess the transferability of the identified drivers and to expand the applicability domain of the proposed modeling framework.
7. Conclusions
This study demonstrates that interpretable machine learning (ML) constitutes an effective and scientifically rigorous framework for elucidating structure–property–performance relationships in macroporous polyHIPE-based magnetic polymer composites using exclusively existing experimental data. By integrating formulation parameters, quantitative morphological descriptors, and physicochemical properties, the proposed approach achieves reliable prediction of removal efficiency (%) under fixed operating conditions, while explicitly accounting for uncertainty in a small-data regime.
Comparative evaluation of linear, ensemble-based, and probabilistic models confirms that nonlinear approaches—particularly tree-based ensembles and Gaussian Process Regression—substantially outperform linear baselines. This outcome reinforces the conclusion that adsorption performance in polyHIPE-based composites is governed by coupled, nonlinear interactions rather than simple monotonic dependencies on individual descriptors. Importantly, these predictive gains are achieved without compromising robustness, owing to conservative model selection, repeated cross-validation, and deliberate feature parsimony.
Crucially, the value of the framework extends beyond predictive accuracy. Interpretability analyses reveal that functional performance emerges from a balanced interplay between porous openness, accessible surface area, and γ-Fe2O3 nanoparticle loading, while formulation variables influence performance primarily through their indirect effects on structure and emergent physicochemical properties. These findings clarify why conventional one-variable-at-a-time analyses often fail to capture performance trends in polyHIPE systems and explicitly expose the trade-offs between adsorption efficiency and magnetic functionality that define the multifunctional design space.
The analysis is intentionally constrained to the experimental domain represented by the curated datasets, and no claims of strong physical causality or extrapolative prediction are advanced. Within these bounds, the results demonstrate that interpretable, uncertainty-aware ML can extract design-relevant insight from limited experimental data in a transparent and reproducible manner. More broadly, this work illustrates how data-driven modeling can productively complement experimental studies of multifunctional polymer matrix composites, enabling rational screening of formulation strategies and guiding targeted future experimentation without increasing experimental burden.
Experimentally Actionable Recommendations
The interpretability-driven insights obtained in this study can be translated into concrete experimental actions to refine and extend the design space of macroporous polyHIPE-based magnetic composites. Based on model uncertainty, partial dependence analysis, and observed multifunctional trade-offs, the following experimental directions are specifically recommended:
Targeted tuning of pore openness near the saturation regime. Conduct controlled synthesis experiments varying surfactant content and internal phase composition to generate samples with degree of openness values concentrated around the intermediate–high range identified by the PDP and ICE analyses. This would experimentally validate the predicted diminishing returns and clarify whether the observed saturation reflects transport limitations or synthesis-induced constraints.
Decoupling surface area and pore connectivity. Design formulations that achieve comparable BET surface areas but different pore connectivity (e.g., via surfactant type or post-polymerization treatments) to disentangle their coupled effects on adsorption performance, which were identified as jointly dominant in the ML analysis.
Systematic variation in γ-Fe2O3 loading at fixed morphology. Perform experiments in which magnetic nanoparticle loading is varied while maintaining similar pore size and openness, to directly test the predicted non-monotonic effect of γ-Fe2O3 loading and to distinguish pore blockage, aggregation, and surface masking mechanisms.
Focused data acquisition in high-uncertainty regions. Prioritize new experiments in regions of the descriptor space associated with elevated predictive uncertainty (as identified by GPR variance and ensemble dispersion), particularly combinations of high openness and high nanoparticle loading that are underrepresented in the current dataset.
Extension to condition-dependent performance metrics. For selected representative formulations, measure removal efficiency across multiple pH values, contact times, or dye concentrations to assess whether the identified structural drivers remain dominant under varying operating conditions.
Cross-material validation with chemically distinct dyes. Test a small number of anionic and neutral dyes on structurally comparable polyHIPE composites to evaluate the transferability of the learned structure–performance relationships beyond methylene blue, while remaining within the same porous architecture class.
Collectively, these targeted experiments would directly test the hypotheses generated by the interpretable ML framework, improve model robustness, and progressively expand the applicability domain without requiring large-scale experimental campaigns.