Mathematical and Neuro-Fuzzy Modeling of a Hollow Fiber Membrane System for a Petrochemical Process

Bryand J. Garcia-Sigales; Jose A. Ruz-Hernandez; Jose-Luis Rullan-Lara; Alma Y. Alanis; Mario Antonio Ruz Canul; Juan Carlos Gonzalez Gomez; Francisco J. Romero-Sotelo

doi:10.3390/chemengineering9060115

,

and

¹

Facultad de Ingeniería, Universidad Autonoma del Carmen (UNACAR), Ciudad del Carmen 24180, Campeche, Mexico

²

Departamento de Innovacion Basada en la Informacion y el Conocimiento, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara (U.de.G), Guadalajara 44430, Jalisco, Mexico

^*

Authors to whom correspondence should be addressed.

ChemEngineering2025, 9(6), 115;https://doi.org/10.3390/chemengineering9060115

This article belongs to the Collection New Advances in Chemical Engineering

Version Notes

Order Reprints

Abstract

This work presents a hybrid model that integrates a mechanistic multicomponent transport scheme in hollow-fiber membranes with an Adaptive Neuro-Fuzzy Inference System (ANFIS). The physical model incorporates pressure drops on the feed and permeate sides (Hagen–Poiseuille), non-ideal gas behavior (Peng–Robinson equation of state), and temperature-dependent viscosity; species permeances are treated as constant for model validation. After validation, a post-validation parametric exploration of permeance variability is carried out by perturbing the methane (CH₄) permeance by one decade up and down. From an initial set of 18 variables, 4 key parameters were selected through rigorous statistical analysis (Pearson correlation, variance inflation factor (VIF), and mean absolute error (MAE)); likewise, other physical criteria have been considered: permeance, retentate volume, retentate pressure, and retentate viscosity. Trained with 70% of the simulated data and validated with the remaining 30%, the model achieves a coefficient of determination (R²) close to 0.999 and a root mean square error (RMSE) below 8 × 10⁻⁸ m³/h in predicting the methane volume in the retentate, effectively responding to both steady and dynamic fluctuations. The combination of first-principles modeling and adaptive learning captures both steady-state and dynamic behavior, positioning the approach as a viable tool for real-time analysis and supervisory control in petrochemical membrane operations.

Keywords:

hollow fiber membranes; artificial intelligence; neural networks; ANFIS; hybrid modeling; petrochemical process; gas separation

1. Introduction

Daily operation of oil and gas production systems requires critical decisions that directly affect production volumes and associated costs [1]. These decisions, made at different organizational levels, converge in the physical production system. In recent years, the industry has evolved toward deeper technological integration, giving rise to the paradigm known as Industry 4.0. This approach promotes the use of intelligent technologies to improve operational efficiency [2].

In the context of natural gas processing, which consists mainly of methane (CH₄), one of the most relevant challenges is the removal of carbon dioxide (CO₂), as it reduces the calorific value of the gas and accelerates pipeline corrosion. Membrane-based separation technologies, particularly hollow fiber membrane modules (HFMM), have gained importance due to their energy efficiency, modularity, and compactness. These membranes operate through the principle of selective permeation, leveraging differences in solubility and molecular size to separate CH₄ from CO₂ without the need for chemical solvents.

Gas separation in petrochemical processes has therefore become an active area of research, with CH₄/CO₂ separation receiving significant attention in recent decades due to the dual objectives of natural gas purification and carbon emission reduction. From a modeling perspective, multiple approaches have been proposed. Chu et al. [3] developed steady-state mass transfer models, while Gu [4] extended this framework by including non-ideal gas effects, concentration profiles, and pressure drops, achieving accurate prediction of experimental data. Later, Ko [5] introduced a dynamic model that considered transient disturbances in operating conditions, enabling more realistic simulations of industrial processes. However, purely physical or “white-box” models require precise estimation of parameters such as permeability, viscosity, and molar volume, which are not always available and often present high uncertainty, particularly in multicomponent systems.

To alleviate these limitations, artificial intelligence (AI) methods have been explored. Artificial neural networks (ANNs) model complex nonlinearities but lack physical interpretability and may generalize poorly beyond the training domain. This motivates hybrid (“gray-box”) strategies that fuse physics with data-driven inference. Among these, the adaptive neuro-fuzzy inference system (ANFIS) stands out by combining the learning capacity of ANNs with fuzzy rule-based reasoning, improving transparency and adaptability [6].

Recent work signals a shift toward hybrid intelligence in gas separation (see Table 1). In polymeric hollow fibers, a GA-optimized ANFIS—genetic algorithm (GA) tuning and ≈7 Gaussian membership functions—predicted permeate-side CO₂ for CH₄/CO₂ separations with near-perfect fit (R² ≈ 0.9993; RMSE ≈ 0.0064; Average Absolute Relative Deviation (AARD) ≈ 1.25%) [7]. In mixed-matrix membranes (MMMs) comprising fumed silica (FS), polyhedral oligomeric silsesquioxane (POSS), and polydimethylsiloxane (PDMS), differential evolution (DE)–ANFIS was benchmarked against a Crow Search Algorithm–Least-Squares Support Vector Machine (CSA-LSSVM) for single-gas permeation (H₂/CH₄/CO₂/C₃H₈): DE-ANFIS reached R² = 0.9981 (overall), while CSA-LSSVM reported R² = 0.9946/0.9689 (train/test) with MSE = 0.0003/0.0011 and MAE = 0.0114/0.0257 [8]. For silicoaluminophosphate-34 (SAPO-34) MMMs, clustering-guided ANFIS variants—subtractive clustering (SC) and Fuzzy C-Means (FCM)—as well as genetic programming (GP), reproduced CO₂ permeability with AARD < 3% and R² > 0.995 [9]. In polymethylpentene (PMP) modified with nanoparticles, a multilayer perceptron (MLP) with Bayesian regularization (3–8–1) captured “separation capacity” (CO₂ permeability) with R = 0.99477, MAE = 6.87, AARD = 5.46%, MSE = 152.75 [10]. By contrast, an empirical-plus-regression HFMM reported a modest R² ≈ 0.628 for CO₂ enrichment/distribution, underscoring the need to embed dynamic multicomponent transport and deliver real-time, explainable responses beyond black-box regression [11].

Table 1. Comparable studies.

Table 2 widens the lens to adjacent problems with transferable patterns. In electro-assisted ultrafiltration of water, a Takagi–Sugeno–Kang (TSK) fuzzy model predicted Ni²⁺ removal with R² ≥ 0.98 and maximum rejections of ≈60% (polyethersulfone, 5 kDa (PBCCS) at ~3.5–4.5 V) and ≈45% (regenerated cellulose, 5 kDa (PLCCS) at 4 V) [12]. For C₃H₆/C₃H₈ adsorption in copper benzene-1,3,5-tricarboxylate (Cu-BTC, HKUST-1), particle swarm optimization (PSO)–ANFIS was compared to ANN; ANN yielded the lower MAE (0.111 vs. 0.421) [13]. CO₂ solubility in Polystyrene (PS)/ Poly(vinyl acetate) (PVAc)/ Polybutylene succinate (PBS)/ Poly(butylene succinate-co-adipate)(PBSA) was modeled via genetic programming (GP) with R² > 0.98 and small average relative deviations (ARDs) (≈0.095%, 0.0503%, 0.0312%, 0.039%; 70/30 split; inputs T[K], P[MPa]) [14]. A computational fluid dynamics (CFD) → ANFIS mapping reproduced bubble-column void fraction: with two inputs and eight membership functions (MFs), it attained R ≈ 0.9999, although adding turbulence reduced performance (R_test ≈ 0.64) [15]. For Cu/H₂O nanofluids in a lid-driven square cavity, a grid-partition ANFIS (3 inputs: x, y, T; 4 MFs/input → 64 rules) achieved R ≈ 0.999 (~65% training; up to ~800 iterations); a compact ant colony optimization (ACO) surrogate reached R≈0.92 [16]. CO₂ absorption in a closed-vessel nanofluid absorber was learned by an MLP (6–9–1) trained with the Levenberg–Marquardt algorithm (LM), yielding R = 0.9996, MSE = 2.36 × 10⁻⁵, MAE% = 0.326 (N = 165) [17]. In forward osmosis (FO) for textile wastewater, combinations of response surface methodology (RSM), ANN (feed-forward backpropagation with Levenberg–Marquardt, FFBP-LM) and ANFIS (Sugeno; five inputs) predicted/optimized water flux (Jw) and reverse salt flux (Js); the ANN evaluation reported Jw: R² ≈ 0.798, R ≈ 0.8933, MAD ≈ 0.8120, MSE ≈ 1.4800, RMSE ≈ 1.2170; Js: R² ≈ 0.7807, R ≈ 0.8836, Mean Absolute Deviation (MAD) ≈ 0.7940, MSE ≈ 1.5220, RMSE ≈ 1.2340; a 70/15/15 protocol achieved R²_train ≈ 0.998, R²_test ≈ 0.997, RMSE_train/test ≈ 0.0136/0.0194 [18]. In bubble-column CO₂ absorption with SiO₂/H₂O and Fe₃O₄/H₂O, an ANFIS with Fuzzy C-Means (ANFIS-FCM) (3 rules; hybrid learning) produced RMSE ≈ 0.014–0.019; optimal nanoparticle (NP) loadings were 0.025 wt.% SiO₂ and 0.015 wt.% Fe₃O₄, enhancing the mass-transfer enhancement factor (E) to ≈ 1.43 (+43%) and 1.13 (+13.4%), respectively [19]. For sheet-type water permeation, a probabilistic neural network–group method of data handling (PNN–GMDH) with PSO optimized permeate flux, achieving R² ≈ 0.983, performance index (PI) = 0.723, and Nash–Sutcliffe efficiency (NSE) ≈ 0.984 [20].

Table 2. Adjacent studies.

CO₂ absorption correlations in nanofluids built with GP and GMDH showed GP outperforming GMDH (R² = 0.9914, AARD = 3.732%, RMSE = 0.0141 vs. R² = 0.9726, AARD = 8.1134%, RMSE = 0.0231; n = 230; 80/20; inputs t, P, T, d_n_p, C wt.%, ρ_n_p) [21]. In metal–organic frameworks (MOFs), a random forest (RF) predicted CH₄/CO₂ biogas separation (main metrics in Supplementary) [22]. Finally, porous liquids (PLs) show a clear AARD ranking for CO₂ solubility: 3.17% (CSA-LSSVM) < 6.64% (MLP) < 8.67% (PSO-ANFIS) < 12.98% (ANFIS) [23].

Against this backdrop, our study integrates a mechanistic, multicomponent HFMM model—accounting for pressure drops, non-ideal gas behavior via the Peng–Robinson equation of state (PR-EOS), and temperature-dependent viscosity—with an ANFIS layer trained on statistically paired, physically meaningful inputs (permeance, retentate volume, retentate pressure, retentate mixture viscosity).

This gray-box synthesis delivers near-perfect agreement (R² ≈ 0.999; RMSE < 8 × 10⁻⁸ m³·h⁻¹) for retentate methane volume while retaining interpretability and enabling real-time use, directly addressing the literature gap on dynamic CH₄/CO₂ separation in HFMM. In doing so, it reduces parameter uncertainty by coupling first-principles transport with data-driven inference; optimizes membrane design and operating conditions to maximize CH₄ purity and flow rate; and provides a robust, interpretable, and adaptive system capable of operating under variable conditions—aligning with the principles of Industry 4.0.

To ensure the clarity and reproducibility of the study, it is essential to detail the methodological sequence followed to construct the hybrid model. The starting point was the development of a mechanistic, multicomponent transport model. To ensure its physical fidelity before any subsequent use, this model was rigorously validated against established experimental data from the literature, using the Ko model [5] as a benchmark.

Once the reliability of the mechanistic model was confirmed, it was employed as a “virtual plant” to generate a comprehensive synthetic dataset, thus overcoming the lack of a dedicated experimental dataset for training. Subsequently, to optimize the quality of the inputs for the intelligent model, this synthetic dataset was subjected to a rigorous variable selection process. Based on the statistical criteria detailed in Section 2.2, the initial set of 18 candidate predictors was reduced to the four most informative and statistically independent ones.

Finally, this refined dataset served as the foundation for training and testing the ANFIS layer. The ultimate purpose of this work is for the developed ANFIS model, thanks to its high precision and speed, to serve in a future stage as the predictive core within an advanced optimization and control strategy, whose fundamental structure is illustrated in Figure 1. In such a scheme, the model would predict the system’s behavior, allowing an optimizer, guided by an objective function (specifically, to maximize the amount of methane obtained in the retentate), to determine the optimal control actions for the process.

Figure 1. Basic structure of model-based optimization.

This paper is organized as follows. The Section 2 (Materials and Methods) comprises three subsections: Section 2.1 develops the multicomponent HFMM mechanistic model, including temperature-dependent viscosity and non-ideal gas behavior; Section 2.2 presents the statistical scaffold—data normalization, a normalized multiple linear regression baseline, assumption diagnostics, and multicollinearity/variable selection; and Section 2.3 details the ANFIS layer (membership functions, rule base, hybrid training, and the train/validation/test split). Section 3 presents the results, and Section 4 concludes the paper.

2. Materials and Methods

This section details the methodology used to develop a hybrid model that predicts gas separation in an HFMM. The process begins with the construction of a mathematical model based on first principles, which is rigorously validated. Subsequently, this model is used as a “virtual plant” to generate a synthetic dataset. Finally, the generated data are subjected to a thorough statistical analysis to select the most influential variables, which serve as input for training the intelligent ANFIS model. To further enhance reproducibility, Appendix A (Algorithms A1–A5) provides a series of detailed pseudocodes illustrating the implementation of the mechanistic model, data generation, statistical analysis, and the ANFIS training workflows.

2.1. Mathematical Modeling of HFMM

This model aims to describe the behavior of the separation system through mathematical relationships that account for the gas properties and the membrane’s characteristics. The most critical considerations in this model are described as follows:

The membrane does not deform under the applied pressure. This is a simplifying assumption [3].
Pressure drop on the permeate and retentate sides can be calculated using the Hagen-Poiseuille equation.
Permeance of chemical species is considered constant in model validation (see Section 2.1.3).

This model is based on a set of differential equations that describe the variation of the component flow along the membrane. Equation (1) describes the rate of change in the molar flow of component i on the retentate side of the membrane in the flow direction. Equation (2) represents the rate of change in the molar flow of component i on the permeate side. Equation (3) details the pressure variation along the membrane on the retentate side, considering the dynamic viscosity of the mixture and membrane geometry. Finally, Equation (4) describes pressure variation on the permeate side; see Figure 2 [3,24].

\frac{d v_{i}}{d z} = - {P m}_{i} (p_{i}^{R e} - p_{i}^{P e}) D_{o} π N

(1)

\frac{d u_{i}}{d z} = {P m}_{i} (p_{i}^{R e} - p_{i}^{P e}) D_{o} π N

(2)

\frac{d P}{d z} = \frac{192 N D_{o} (D + N D_{o}) R T μ_{m i x}^{R e}}{π {(D^{2} - N D_{o}^{2})}^{3} P} v

(3)

\frac{d p}{d z} = - \frac{128 R T μ_{m i x}^{P e}}{π D_{i}^{2} N p} u

(4)

where

{P m}_{i}

is permeance of component i

\frac{m o l}{m^{2} \cdot a t m \cdot s}

,

p_{i}^{R e}

and

p_{i}^{P e}

denote partial pressures of component in the retentate and permeate section,

a t m

,

N

is the number of tubes or fibers,

i

is component index,

u

is flow rate on the permeate side

(m o l \cdot s)

and

v

is the flow rate on the retentate side

(m o l \cdot s

). The internal diameter of the shell is denoted by

D

(m)

, while

D_{i}

and

D_{o}

refer to the internal and external diameter of the fiber

(m)

, respectively,

R

is the universal gas constant,

T

is the temperature, and

μ_{m i x}

is the dynamic viscosity of the mixture

(P a \cdot s)

. Superscripts

R e

and

P e

are employed to denote quantities in the retentate and permeate streams, respectively. These notations are consistently applied to other variables in subsequent sections to facilitate further analyses.

Figure 2. Countercurrent flow dynamics in an HFMM scheme.

To solve the differential equations, the MATLAB^® (version R2025b) function ode15s is used. This function employs an adaptive implicit method based on backward differentiation formulas (BDF). This approach is particularly suitable for stiff systems, as it can handle rapid and slow variations in a stable and efficient manner. First, the model equations and initial conditions were defined (see Table 3 and Table 4). Then, error tolerances and the integration interval were configured. This setup allowed for precise and stable numerical solutions for the system of interest. This model has proven valid with data from studies in [3,25,26].

Table 3. Initial conditions of the Process.

Table 4. Initial conditions of chemical species.

2.1.1. Dynamic Determination of Viscosity

According to [27], the viscosity of a pure fluid can be determined using the Chapman-Enskog solution, which is based on the Boltzmann transport equation. This approach assumes that molecular interactions can be approximated by those of Lennard–Jones particles with a potential function. Dynamic viscosity of a gas

μ_{i}

is calculated using the kinetic theory of gases, which relates this property to temperature

T

, molar mass

M_{i}

, collision diameter σ, and collision integral

Ω_{μ}

. The pure-component molecular weights used in the Wilke mixing rule are given in Appendix A, Table A1.

μ_{i} = 2.669 \times 10^{- 5} \frac{\sqrt{T \cdot M_{i}}}{σ^{2} \cdot Ω_{μ}}

(5)

Reduced collision integral is calculated by considering the possible trajectories of particles during their approach. An empirical correlation relates

Ω_{μ}

to temperature.

Ω_{μ} = \frac{1.16145}{T^{* 0.14874}} + \frac{0.52487}{e x p ({0.77320 T}^{*})} + \frac{2.16178}{e x p ({2.43787 T}^{*})}

(6)

where the reduced temperature

T^{*}

.

T^{*} = \frac{T}{\frac{ε}{k_{B}}}

(7)

In this expression,

ε

is the depth of the Lennard–Jones potential well, and

k_{B}

is the Boltzmann constant (1.380649 × 10⁻²³ J/K). The term

\frac{ε}{k_{B}}

has units of kelvins, making

T^{*}

dimensionless. Physically,

T^{*}

represents the ratio of the system temperature to the characteristic energy of intermolecular interactions. The Lennard–Jones parameters adopted for collision-integral evaluations are summarized in Appendix A, Table A3.

According to [28] viscosity of a diluted gas is related to the mole fraction

χ_{i}

, and the viscosity of the diluted gas component

i

, following Wilke’s equation.

μ_{m i x} = \sum_{i} \frac{χ_{i} μ_{i}}{\sum_{j} χ_{j} φ_{i j}}

(8)

Binary weighting factor

φ_{i j}

expressed as a function of gas viscosity and the molecular weight of component as follows:

φ_{i j} = \frac{{[1 + {(\frac{μ_{i}}{μ_{j}})}^{0.5} {(\frac{M_{i}}{M_{j}})}^{0.25}]}^{2}}{{[8 (1 + \frac{M_{i}}{M_{j}})]}^{0.5}}

(9)

2.1.2. Peng–Robinson Equation of State for Non-Ideal Gas Behavior

To accurately represent the non-ideal behavior of gas mixtures within the HFMM, Peng–Robinson (PR) cubic equation of state is implemented in the mathematical model. This approach allows for the calculation of the real molar volume for both the retentate and permeate streams at each spatial position along the membrane, thus providing a more rigorous description of volumetric and transport phenomena, particularly under elevated pressure conditions [29].

The Peng–Robinson equation of state is given by the following:

P = \frac{R T}{V_{m} - b_{i}} - \frac{a_{i} α_{i} (T)}{V_{m}^{2} + 2 b_{i} V_{m} - {b_{i}}^{2}}

(10)

where

P

is the pressure,

T

is the temperature,

V_{m}

is the molar volume, and

R

is the universal gas constant. The parameters

a_{i}

and

b_{i}

are determined from the critical temperature

T_{c}

, critical pressure

P_{c}

, and acentric factor

ω

of each species.

Computation of the parameters

a_{i}

and

b_{i}

for Peng–Robinson equation of state in multicomponent mixtures requires the use of appropriate mixing rules, as well as the estimation of temperature-dependent correction factors for each component.

First, for each pure component i, the parameters are defined as follows:

a_{i} = 0.45724 \cdot \frac{R^{2} T_{c, i}^{2}}{P_{c, i}}

(11)

b_{i} = 0.07780 \cdot \frac{R T_{c, i}}{P_{c, i}}

(12)

where

T_{c, i}

and

P_{c, i}

are the critical temperature and pressure of component i, respectively. The term

α_{i}

introduces temperature dependence, calculated as follows:

α_{i} (T) = {[1 + k_{i} (1 - \sqrt{\frac{T}{T_{c, i}}})]}^{2}

(13)

k_{i} = 0.37464 + 1.54226 ω_{i} - 0.26992 ω_{i}^{2}

(14)

For mixtures, global parameters

a_{m i x}

and

b_{m i x}

are obtained by classical mixing rules:

a_{m i x} = \sum_{i} \sum_{j} χ_{i} χ_{j} \sqrt{a_{i} α_{i} (T) a_{j} α_{j} (T)} (1 - k_{i j})

(15)

b_{m i x} = \sum_{i} χ_{i} b_{i}

(16)

In these equations,

χ_{i}

and

χ_{j}

are the mole fractions of components i and j, respectively. Binary interaction parameter

k_{i j}

is an empirical coefficient used to account for non-ideal interactions between unlike pairs of molecules in the mixture, and it is usually obtained from experimental data or taken as zero in the absence of specific information.

Final values of

a_{m i x}

and

b_{m i x}

are then substituted into the cubic Peng–Robinson equation of state to compute the molar volume or compressibility factor for the mixture under the specified conditions of temperature, pressure, and composition [29].

The Peng–Robinson EoS is widely recognized as a robust model for CH₄/CO₂ mixtures under the moderate operating conditions (8 atm, 298.15 K) of this study. Its ability to provide an excellent compromise between simplicity and accuracy for CO₂-rich systems in the subcritical gas phase, with reported errors for compressibility below 1%, makes it a suitable choice for this application [30]. Furthermore, its successful implementation in recent molecular simulation studies of gas separation in membranes under comparable conditions validates its predictive capabilities [31]. The model’s robustness is also supported by its application in studies under even more demanding conditions of higher pressure [32]. The critical properties and acentric factors employed are listed in Appendix A, Table A2.

2.1.3. Permeance: Baseline Assumption and Post-Validation Exploration

The effective

{P m}_{i}

observed in operation can shift with temperature, pressure, and composition (competitive sorption, plasticization), as well as with the architecture/thickness of the selective layer and module non-idealities (pressure drops, film coefficients, non-isothermality). The specialized literature recommends, when reliable parameters are not available, conducting parametric evaluations to bound their impact on module performance and design [33].

At the material/architecture level, hollow-fiber TFC membranes with ultrathin selective layers have reported CO₂ permeances on the order of 2.6–2.7 × 10³ GPU with CO₂/N₂ selectivities of ~21; the authors note that thickness estimation assumed constant permeability despite evidence of thickness dependence and operating-condition variability, reinforcing that effective permeance is not static [34].

At the module/process level, performance depends on the integration of the separation unit (areas, stage cut, pressures) and its couplings with other units; these factors modify driving forces and fluxes, explicitly justifying the use of parametric scans when working with tabulated permeances [35].

Regarding constitutive modeling, frameworks that integrate molecular dynamic (MD)+ free-volume theory and pressure/composition-dependent permeability improve agreement with data but require nontrivial parameters; in parallel, non-ideal models (pressure-dependent permeability, substrate resistance, real-gas behavior) have been shown to reduce deviations relative to the ideal approach, supporting parametric evaluation when a full parametrization is not yet available [36].

Adopted protocol. Without introducing a new constitutive model at this stage, a post-validation exploration was carried out by varying only the CH₄ permeance in {0.1×, 1×, 10×} relative to its baseline. This approach was chosen as a first-order sensitivity analysis to evaluate the system’s response to perturbations in the less permeable species, a key factor controlling methane purity in the retentate. We acknowledge that a complete analysis would involve co-variation of CO₂ permeance to fully map the impact on selectivity, which remains a valuable direction for future studies. The scan reports eighteen outputs spanning: total volumes on each side (retentate and permeate); gas compositions on each side (CO₂ and CH₄ mole fractions); pressures on each side; component molar flows of CO₂ and CH₄ on each side; component volumes of CO₂ and CH₄ on each side; and the mixture viscosity on each side. Together, these metrics quantify the plausible impact on flow splits, compositions, and pressure profiles within a range bounded by the literature.

2.2. Statistical Method

This section details the systematic methodology, illustrated in Figure 3, used to reduce the initial set of 18 candidate predictors to a compact, high-quality subset for ANFIS modeling. The workflow is designed to ensure the final variables are not only statistically powerful but also physically meaningful.

Figure 3. Statistical analysis framework for predictor dimensionality reduction.

The process begins with an Exploratory Analysis and Benchmarking phase, where a multiple linear regression (MLR) model incorporating all 18 variables is constructed. This initial model serves as a baseline, and its performance is evaluated using the coefficient of determination (R²) and mean absolute error (MAE). Its statistical validity is rigorously checked through a series of diagnostic plots, including residual analysis and Q-Q plots, to assess underlying assumptions like homoscedasticity and normality.

Subsequently, a Correlation and Multicollinearity Analysis is performed. A Pearson correlation matrix is generated to identify strong linear relationships between pairs of variables. This is complemented by a more robust multicollinearity assessment using the Variance Inflation Factor with a Ridge penalty (VIF-Ridge), which quantifies how much the variance of an estimated regression coefficient is increased due to collinearity with other predictors.

In parallel, an Individual Performance Evaluation is conducted for each predictor. By fitting a normalized simple linear regression (NSR) for each variable against the target output, we assess their individual predictive contributions, again using R² and MAE as key metrics.

The core of the selection process is the Physical-Statistical Criterion, where all the evidence is synthesized. The statistical findings—from the benchmark model, correlation matrix, VIF scores, and individual regressions—are combined with physical knowledge of the membrane separation process. This crucial step ensures that the selected variables are not just statistically independent but also mechanistically relevant.

Finally, a Quantitative Confirmation is performed. A new, reduced MLR model is built using only the four selected variables. Its performance is re-evaluated to confirm that the parsimonious model retains high predictive accuracy. This multi-stage filtering workflow systematically eliminates weak and redundant variables, guaranteeing that the final input set for the ANFIS model is both robust and interpretable.

All measurements are arranged into the standard regression form:

x = [\begin{matrix} x_{11} & x_{12} & x_{13} & \dots & x_{1 k_{0}} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2 k_{0}} \\ x_{31} & x_{32} & x_{33} & \dots & x_{3 k_{0}} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{n 1} & x_{n 2} & x_{n 3} & \dots & x_{n k_{0}} \end{matrix}] y = [\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{matrix}]

(17)

where

n

denotes the number of experimental observations (rows) and

k_{0}

is the number of initial predictors (

k_{0} = 18

). This succinct representation facilitates matrix-based computations and algorithmic efficiency.

To explore the relationships between variables, scatter plots are employed to visualize ordered pairs (

x_{s, q}

,

y_{s}

). It’s not just a graph but a visual representation of the relationship between the two variables. When most points align closely with a straight line, the correlation is classified as linear. Conversely, the correlation is considered nonlinear if the points follow a curved pattern. However, if no discernible pattern is observed among the ordered pairs, it indicates the absence of a relationship between the two variables. To depict the curve or regression line that best fits the distribution of the ordered pairs, Equation (18) is applied [36].

y = β_{0} + β_{1} x_{s, 1} + β_{2} x_{s, 2} + \dots + β_{q} x_{s, q}

(18)

where

β_{0}

is the intercept and

β_{q}

are the regression coefficients associated with each predictor.

2.2.1. Data Normalization

To mitigate bias from differing variable scales, all features and the target variable are standardized to zero mean and unit variance:

x_{s, q}^{(n)} = \frac{x_{s, q} - μ_{x_{q}}}{σ_{x_{q}}}, y_{s}^{(n)} = \frac{y_{s} - μ_{y}}{σ_{y}}

(19)

where

μ_{x_{q}}

and

σ_{x_{q}}

are vectors of column-wise means and standard deviations of the

q

predictor across all

n

samples.

μ_{y}

and

σ_{y}

are the mean and standard deviation of the dependent variable

y

.

2.2.2. Normalized Multiple Linear Regression

A normalized multiple linear regression model was fitted to the standardized data:

y^{(n)} = X^{(n)} β

(20)

where

y^{(n)}

is the vector of normalized predicted values.

X^{(n)}

is the normalized design matrix (dimensions

n \times k_{0}

), with each column scaled to zero mean and unit variance, and

β

is the vector (dimensions

k_{0} \times 1

).

2.2.3. Parameter Estimation via Gradient Descent

The coefficients

β

were optimized by minimizing the mean squared error (MSE) cost function:

J (β) = \frac{1}{2 n} \sum_{s = 1}^{n} {(y_{s}^{(n)} - {\hat{y}}_{s}^{(n)})}^{2}

(21)

Using stochastic gradient descent (SGD) with Learning rate

α = 0.01

, Iterations = 1000 (ensuring convergence to local minima) and an update rule according to the following:

β^{(t + 1)} = β^{(t)} - α \nabla_{β} J β^{(t)}

(22)

where the gradient

\nabla J

is computed as follows:

\nabla_{β} J = - \frac{1}{n} {(X^{(n)})}^{T} (y^{(n)} - y^{(n)})

(23)

Normalized predictions are transformed back to the original scale for direct comparison with experimental data. We evaluate performance with two metrics: the coefficient of determination (R²), defined in Equation (24), and the mean absolute error (MAE), defined in Equation (25).

R^{2}

∈ [0,1] quantifies the fraction of variance in the target explained by the model, values closer to 1 indicate a better fit. For completeness, the Pearson correlation coefficient

r

∈ [−1, 1] measures the strength and direction of linear association between variables [34].

R^{2} = 1 - \frac{\sum_{s = 1}^{n} {(y_{s} - {\hat{y}}_{s})}^{2}}{\sum_{s = 1}^{n} {(y_{s} - \bar{y})}^{2}}

(24)

M A E = \frac{1}{n} \sum_{s = 1}^{n} |y_{s} - {\hat{y}}_{s}|

(25)

Normalized simple linear regression is additionally performed for each independent variable

{x;}_{q}^{(n)}

, repeating the same procedure and metric calculations to identify the individual contribution of each variable to the model.

2.2.4. Multicollinearity Analysis

Given that collinearity among predictors can inflate the variance of regression coefficients and undermine the numerical stability of the model. This procedure enabled the systematic elimination of highly collinear and weakly predictive variables, reducing the original pool of 18 candidate predictors to the four most informative and mutually independent inputs for ANFIS training.

Pearson’s correlation coefficients are computed according to [37] on the normalized predictor matrix

X^{(n)}

.

ρ_{q l} = \frac{\sum_{s = 1}^{n} (x_{s, q}^{(n)} - {\bar{x}}_{q}^{(n)}) (x_{s, l}^{(n)} - {\bar{x}}_{l}^{(n)})}{\sqrt{\sum_{s = 1}^{n} {(x_{s, q}^{(n)} - {\bar{x}}_{q}^{(n)})}^{2}} \sqrt{\sum_{s = 1}^{n} {(x_{s, l}^{(n)} - {\bar{x}}_{l}^{(n)})}^{2}}}

(26)

With this, it was also possible to construct a correlation matrix which compacts all the

ρ_{i j}

into a single object.

R_{x x} = {[ρ_{q l}]}_{q, l = 1}^{k_{0}}

(27)

Inspecting the correlation matrix quickly flags pairs with

ρ_{q l}

near 1, it remains a subjective heuristic and tells us nothing about how much multicollinearity is affecting our coefficient estimates. To quantify that impact, we turn to the variance inflation factor

{V I F}_{i}

for predictor

q

, quantifying how much the variance of its estimated coefficient

{\hat{β}}_{q}

is inflated due to multicollinearity. Formally,

{V I F}_{q} = \frac{V a r ({\hat{β}}_{q}| w i t h m u l t i c o l l i n e a r i t y)}{V a r ({\hat{β}}_{q}| n o m u l t i c o l l i n e a r i t y)} = {[{(R_{x x})}^{- 1}]}_{q q}

(28)

According to [38],

{V I F}_{q}

is the

q - t h

diagonal element of

{(R_{x x})}^{- 1}

. Values

{V I F}_{q}

> 10 indicate severe collinearity. However, when there is severe collinearity, the correlation matrix

R_{x x}

can become close to singular and lead to extreme or undefined values of

{V I F}_{q}

. To address this problem, we introduce a Ridge penalty in the collinearity diagnosis. Strictly speaking, we define the “penalized” version of the

V I F

as follows:

{V I F}_{q, r i d g e} = {[{{(R_{x x} + λ I_{k_{0}})}^{- 1} R}_{x x} {(R_{x x} + λ I_{k_{0}})}^{- 1}]}_{q q}

(29)

Ridge penalty hyperparameter

λ

controls the degree of regularization and was selected (λ = 0.03) via literature

{V I F}_{q, r i d g e}

.

I_{k_{0}}

is the identity matrix of size

(k_{0} \times k_{0})

, where

p

is the number of original predictors.

2.2.5. Validation of Model Assumptions via Diagnostic Plots

To ensure that the normalized linear regression satisfies its underlying assumptions prior to ANFIS training, the following diagnostic plots are systematically generated:

Residuals vs. Predicted Values: Plot the residual against the fitted value y_k. This visualization is used to assess homoscedasticity—constancy of error variance across the prediction range—and to detect any funnel-shaped or nonlinear patterns that would indicate model misspecification.
Q-Q-Plot of Residuals: Construct a quantile-quantile plot comparing the empirical quantiles of the residuals to theoretical quantiles of a standard normal distribution. Close adherence of points to the 45° reference line will validate the assumption of approximate Gaussianity of the errors, which is important for subsequent inferential procedures.
Histogram of Residuals: Generate a normalized histogram of residuals to examine the shape of their distribution—specifically symmetry, peakedness, and tail behavior. This complements the Q-Q-plot by visually confirming the presence (or absence) of skewness or heavy tails.
Actual vs. Predicted Values: Plot observed responses $y_{k}$ versus predicted values ${\hat{y}}_{s}$ alongside the identity line $y_{s} = {\hat{y}}_{s}$ . A tight cluster of points around this line will demonstrate high overall predictive accuracy and minimal systematic bias.
Absolute Error per Sample: Compute and plot the absolute error for each sample index. This plot helps to identify isolated outliers or regions where the model’s performance deteriorates, thereby guiding further data investigation or model refinement.

By applying these five diagnostic plots, one can confirm the validity of homoscedasticity and normality assumptions, quantify overall goodness-of-fit, and detect any influential observations that may warrant additional scrutiny before proceeding with ANFIS modeling.

2.3. Intelligent Modeling with ANFIS

The technique implemented in this study is the ANFIS, which integrates two intelligent methods: ANNs and fuzzy logic. This hybrid approach combines the learning capacity of ANNs with the interpretability of fuzzy systems, effectively addressing the limitations inherent to each individual technique as a universal approximator [39,40,41].

Classical ANFIS architecture, illustrated in Figure 4, consists of five layers:

Figure 4. Architecture scheme of a two-input ANFIS network.

Layer 1 (Membership Functions): Each input $x_{j}$ is fuzzified using membership functions (MFs) that transform crisp inputs into membership degrees $μ$ ranging from 0 to 1. In this work, different membership functions were tested, such as Gaussian, triangular, sigmoidal, and bell functions. After a comparative analysis, the membership function with the best performance was the sigmoidal one, as presented and discussed later in Table 10. The sigmoidal membership function is defined as follows:

$μ_{M F} (x; a_{M F}, c_{M F}) = \frac{1}{1 + e^{- a_{M F} (x - c_{M F})}}$

(30)

where parameters

a_{M F}

and

c_{M F}

control the slope and center of the function, respectively.

Layer 2 (Rule Firing Strength): Computes the firing strength $w_{r}$ of each fuzzy rule by aggregating membership degrees of all inputs via product:

$w_{r} = \prod_{m = 1}^{Q} μ_{M F} A_{r m} (x_{m}), r = 1, \dots, M,$

(31)

where

M

is the number of fuzzy rules and

Q

is number of inputs.

Layer 3 (Normalization): Normalizes firing strengths:

${\bar{w}}_{r} = \frac{w_{r}}{\sum_{r^{’} = 1}^{M} w_{r^{’}}}$

(32)
Layer 4 (Consequents): Each rule computes an output as a linear function of inputs weighted by ${\bar{w}}_{r}$ :

$f_{r} = {\bar{w}}_{r} \cdot (\sum_{m = 1}^{Q} b_{r m} x_{m} + c_{r})$

(33)

where

b_{r m}

and

c_{r}

are consequent parameters.

Layer 5 (Output Aggregation): Aggregates all outputs to produce final prediction:

$\hat{y} = \sum_{r = 1}^{M} f_{r} = \frac{\sum_{r = 1}^{M} w_{r} f_{r}}{\sum_{r = 1}^{M} w_{r}}$

(34)

Neural network components enable continuous adjustment of weights and parameters during learning. The training method applied is a hybrid algorithm combining least squares estimation for the consequent parameters and gradient descent for nonlinear membership parameters

a_{M F}

, and

c_{M F}

[39,40].

The error function minimized during training is the mean squared error:

E = \frac{1}{N_{s}} \sum_{t = 1}^{N_{s}} {(y_{t} - {\hat{y}}_{t})}^{2}

(35)

where

y_{t}

and

{\hat{y}}_{t}

are actual and predicted outputs respectively. Update rule for any parameter

α

follows gradient descent:

Δ α = - η \frac{\partial E_{p}}{\partial α}

(36)

with learning rate

η

controlling step size.

To capture temporal dependencies, input vectors include up to 20 previous samples (lags), generating delayed inputs such as the following:

{\hat{v}}_{{C H}_{4}}^{R} (t) = f [\begin{matrix} V^{R e} (t - d) & {P m}_{{C H}_{4}} (t - d) & {μ_{m i x}}^{R e} (t - d) & P^{R e} \end{matrix} (t - d)], d = 1,2, \dots, 20,

(37)

Data for these delayed inputs are generated from white-box membrane transport model and loaded for ANFIS training. Variable selection is performed as previously described using correlation analysis to ensure relevance and reduce redundancy.

To ensure a robust and interpretable ANFIS model, configuration and training parameters are summarized in Table 5. Model uses four selected input variables—permeance

{P m}_{{C H}_{4}}

, retentate volume

V^{R e}

, retentate pressure

P^{R e}

, and retentate mixture viscosity

μ_{m i x}^{R e}

—with methane volume in retentate

{\hat{v}}_{{C H}_{4}}^{R}

as output. Temporal dependencies are incorporated by evaluating up to 20 time lags per input, allowing the model to capture the dynamic behavior of the membrane process. Sigmoidal membership functions are primarily used. Each input variable has one membership function, balancing simplicity and expressive power.

Table 5. Summary of ANFIS Model Configuration and Training Parameters.

While a greater number of membership functions (MFs) can enhance an ANFIS model’s ability to capture complex nonlinearities, this typically comes at the cost of significantly increased processing time. Our findings indicate that for this application, a model with a minimal set of MFs is not only sufficient but optimal. The marginal gains in accuracy from more complex configurations did not justify the exponential increase in computational demand, rendering them impractical for the real-time estimation of CH₄ volume in the membrane retentate. Consequently, our chosen architecture strikes a robust balance between high predictive fidelity and operational efficiency.

Dataset is split into 70% training and 30% testing, maintaining strict separation to validate generalization. Training employs a hybrid method combining least squares estimation for linear consequent parameters and gradient descent for nonlinear membership parameters

a_{M F}

and

c_{M F}

. Parameters

a_{M F}

and

c_{M F}

are initialized randomly within defined ranges. Training proceeds until the mean squared error (MSE) for a maximum of 50 iterations is reached. Likewise, evaluation time (Eval Time) is computed. Performance metrics (Eval Time, MSE, RMSE, MAE, and

R^{2}

) are computed for both training and testing sets to evaluate overall accuracy. Final model selection is based on the lowest test set MSE among all evaluated lag configurations, ensuring optimal predictive performance. Validation includes graphical comparisons of predicted versus actual values and residual analysis to confirm model reliability. This rigorous framework enables ANFIS to effectively capture the nonlinear and temporal characteristics of the HFMM. We propose implementing a real-time control scheme coupled with bio-inspired algorithms (e.g., PSO, Gray Wolf Optimizer (GWO), and DE) for process tuning and optimization [42].

3. Results

This section presents and analyzes the results obtained from the developed hybrid model, which integrates a mechanistic multicomponent transport model for hollow fiber membranes (HFMM) with an Adaptive Neuro-Fuzzy Inference System (ANFIS). The section is organized into three key subsections to demonstrate the model’s validity, the variable selection methodology, and the predictive performance of the final model.

3.1. Mathematical Modeling of HFMM Results and Validation

The accuracy and predictive capacity of the proposed mathematical model are first evaluated by comparing its results against experimental data and the reference Ko model, as shown in Figure 5. Comparison focuses on two primary performance indicators for the membrane system: methane purity (Figure 5a) and methane recovery (Figure 5b), both as functions of feed flow rate.

Figure 5. Validation of the proposed membrane model using experimental results and the Ko model. (a) Methane purity as a function of feed flow rate. (b) Methane recovery as a function of feed flow rate.

As observed in Figure 5a, all models predict a decreasing trend in methane purity with increasing feed flow rate, which is consistent with the expected reduction in separation efficiency at higher feed rates. Notably, the proposed model tracks the experimental data more closely than the Ko model, particularly at intermediate and high feed flow rates. While the Ko model significantly underestimates methane purity under these conditions, the proposed model shows smaller deviations and better alignment with experimental results, indicating a more accurate representation of the underlying transport mechanisms.

In terms of methane recovery (Figure 5b), both models reproduce the expected increasing trend as the feed flow rate rises. However, the proposed model displays improved agreement with experimental data across tested feed rates, especially at low and moderate flow values where the Ko model exhibits more pronounced discrepancies. This consistent agreement highlights the ability of the proposed model to capture both the magnitude and trend of experimental recovery data.

To provide a quantitative assessment of model accuracy, a detailed error analysis was conducted using the relative error and mean absolute error (MAE) metrics for each experimental case (Figure 6 and Figure 7). Relative error distributions for methane purity (Figure 6a) and methane recovery (Figure 6b) show that, although the Ko model achieves lower errors at the lowest feed flow rates, its accuracy diminishes as flow increases. In contrast, the proposed model shows higher errors at low flow but rapidly improves with increasing feed, outperforming the Ko model in the higher range of operational relevance.

Figure 6. Relative error of the Ko model and the proposed model for the prediction of (a) methane purity and (b) methane recovery as a function of feed flow rate. Results correspond to the same experimental conditions as in Figure 5.

Figure 7. Mean absolute error (MAE) of the Ko model and the proposed model for the prediction of (a) methane purity and (b) methane recovery as a function of feed flow rate. Results correspond to the same experimental conditions as in Figure 5.

MAE analysis (Figure 7a,b) reinforces this observation: while the Ko model maintains a slight advantage at the lowest feed, the proposed model yields lower MAE values at moderate and high feed flows for both purity and recovery. These results confirm that the proposed model delivers robust and reliable predictions under practical operating conditions, supporting its suitability for process optimization and scale-up in HFMM.

3.2. Statistical Method Results

A comprehensive statistical analysis is carried out to evaluate the predictive performance and interrelationships of all candidate input variables considered for modeling. The objective is to identify informative and least redundant subsets of variables for training a subsequent ANFIS model. Table 6 summarizes the results of individual linear regressions, including coefficient of determination (R²), mean absolute error (MAE) and variance inflation factor

{V I F}_{q, r i d g e}

.

Table 6. Initial individual variable regression metrics for model reduction.

Among evaluated variables, Retentate volume and permeate volume showed near-perfect linearity (

R^{2}

≈ 0.9998–0.9999) and extremely low error (MAE = 5.47 × 10⁻⁸). However, their low

{V I F}_{q, r i d g e}

values (0.52) and very high pairwise correlation (ρ > 0.93), as depicted in the Pearson correlation matrix (Figure 8), revealed significant redundancy. Including both would likely lead to multicollinearity without improving model expressiveness.

Figure 8. Pearson correlation matrix between all candidate retentate and permeate variables included in the variable selection procedure. The heatmap indicates the degree of linear relationship between each pair of variables.

Retentate Pressure demonstrated strong predictive ability (

R^{2}

= 0.8356) and moderate

{V I F}_{q, r i d g e}

(5.62), indicating it brings valuable and relatively unique information. Although its

{V I F}_{q, r i d g e}

suggests some correlation with other inputs, it remained within an acceptable range for inclusion. On the other hand, permeance had a lower R² (0.4883) but was favored due to its meaningful physical interpretation, relatively low

{V I F}_{q, r i d g e}

(3.70), and strong negative correlations with both Retentate Volume and Retentate Pressure (ρ ≈ −0.90), suggesting it captures orthogonal aspects of the system behavior.

Regarding viscosity, Permeate Mix Viscosity achieved a high R² (0.8992) and low MAE, indicating good predictive potential. However, its strong correlations with other variables (ρ > 0.9) signaled redundancy. Conversely, retentate mix viscosity showed moderate predictive performance (

R^{2}

= 0.4305) and acceptable multicollinearity (

{V I F}_{q, r i d g e}

= 2.18) but is retained for its physical relevance and lower correlation with other selected variables (ρ < 0.5), supporting its role in improving model robustness.

The correlation matrix (Figure 8) provides a comprehensive overview of the linear interrelationships among all candidate variables. Several variables exhibited strong pairwise correlations (ρ > 0.93), indicating a high degree of redundancy in the information they convey. Notably, variables such as retentate CH₄ flow, permeate CH₄ flow, retentate CO₂ flow, and permeate CO₂ fraction were found to be tightly interrelated with other process variables—particularly retentate volume, permeate volume, and retentate pressure—all of which already captured the principal trends of the system.

Despite achieving near-perfect individual

R^{2}

scores in their univariate regressions, these highly correlated variables were not selected for the final model due to their limited marginal contribution to the overall predictive power and their potential to introduce multicollinearity. This decision is further supported by their very low

{V I F}_{q, r i d g e}

values (typically < 0.60), which mathematically reflect that variance in these predictors can be almost entirely explained by other variables in the model. Including such variables would not only complicate the model unnecessarily but also increase the risk of overfitting and reduce the generalizability of the final predictive tool, particularly in real-world conditions with process noise and variability.

Moreover, from a physical standpoint, flows and fractions are often dependent on the outcomes of pressure, volume, and permeance conditions, and not independent drivers of system behavior. Thus, their high correlation with physically meaningful variables such as retentate pressure and permeance reinforces the decision to exclude them in favor of more mechanistically grounded predictors.

To further validate the reliability of the selected model, residual diagnostics are conducted, as illustrated in Figure 9 and Figure 10. Figure 9a shows the absolute error per sample across the dataset. The majority of errors are small and centered, with some increase observed near sample boundaries. This pattern is consistent with edge effects, where predictive models may exhibit slightly reduced accuracy due to limited data density or extrapolation beyond the core data cloud.

Figure 9. (a) Absolute error for each sample in the validation dataset for the multiple linear regression model. (b) Histogram of residuals showing the distribution and magnitude of errors across all samples. These visualizations provide a direct assessment of the model’s prediction accuracy and the normality of residuals, which are essential for evaluating regression model assumptions.

Figure 10. (a) Residuals as a function of predicted values for a multiple linear regression model, highlighting any trends or heteroscedasticity in model errors. (b) Q-Q plot of residuals, used to assess the normality of errors and the adequacy of linear model assumptions.

Figure 9b presents the histogram of residuals, which exhibits a nearly symmetric bell-shaped distribution, closely approximating normality. This supports the assumption of homoscedastic and unbiased residuals, essential conditions for the validity of linear regression estimators.

Further insight is offered by the residuals vs. predicted plot (Figure 10a), where a slight curvature is observed. While mostly centered around zero, this pattern suggests the presence of mild heteroscedasticity or a weak nonlinear component in data that a linear model may not fully capture. Nevertheless, deviation remains minor and does not compromise the model’s overall fit or its applicability for variable screening.

The scatter plot of actual versus predicted values (Figure 11a) provides a visual confirmation of the regression model’s predictive capability. Most data points align closely with the ideal 1:1 reference line, indicating a strong agreement between observed outputs and those estimated by the model. This tight clustering suggests that the regression model effectively captures dominant trends and intrinsic relationships among selected input variables. Only minimal deviations are observed at the extremes, which is typical in physicochemical models where non-linearities and edge-case behaviors may arise due to unmodeled secondary effects.

Figure 11. Comparison of actual vs. predicted values for the multiple linear regression models. (a) Performance of the model with the initial set of 18 variables. (b) Performance of the reduced model with the 4 selected variables. The dashed line indicates the ideal 1:1 relationship.

Statistical performance of regression model is further substantiated by quantitative metrics presented in Table 7. Model achieved a global coefficient of determination of

R^{2}

= 0.999, indicating that 99.9% of the variability in the target variable is explained by selected predictors. Additionally, the normalized mean absolute error (MAE) is 1.292, reflecting a very low average deviation between predicted and actual values relative to the output scale. Together, these results support the robustness and high fidelity of regression models.

Table 7. Comparison of regression model performance metrics.

To complement the visual and numerical analysis, final regression model’s structure, including linear, interaction, and higher-order terms, is presented in Table 8. This table lists the estimated coefficients

β_{q}

, which quantifies the relative influence of each term used in the polynomial regression model. Inclusion of higher-order coefficients enables the model to capture subtle nonlinear dependencies without introducing excessive complexity, while preserving interpretability and numerical stability.

Table 8. The values of

β_{q}

.

Following a comprehensive analysis that considered multiple statistical criteria—including

R^{2}

, MAE,

{V I F}_{k, r i d g e}

, and Pearson correlation coefficients—a reduced set of four variables was selected as optimal inputs for the subsequent ANFIS training process: Permeance, retentate volume, retentate pressure, and retentate mix viscosity. These variables were chosen based on their ability to capture complementary aspects of the process. Permeance represents the intrinsic transport property of the membrane, influenced by temperature, pressure, and composition. Retentate volume and retentate pressure reflect the operational state of the feed side and are strongly tied to driving forces for mass transfer. Retentate Mix Viscosity, while having moderate

R^{2}

, was retained for its mechanistic relevance and lower correlation with other selected inputs, thus contributing unique variance to the model. Importantly, the selected variables offer a balanced trade-off: they exhibit high predictive power, low to moderate multicollinearity (as evidenced by acceptable

{V I F}_{q, r i d g e}

values), and are rooted in well-understood physical mechanisms governing membrane-based gas separation. This selection aligns with the principle of model parsimony—minimizing the number of inputs without compromising accuracy or generalizability.

Several other variables, despite exhibiting excellent individual regression metrics (e.g., Permeate Flow CH₄, Retentate Flow CO₂, Permeate Fraction CH₄), were deliberately excluded due to high pairwise correlations (ρ > 0.90), excessively low

{V I F}_{q, r i d g e}

values (<0.6), or overlapping information content with already selected predictors. Additionally, some variables lacked direct mechanistic interpretability, which could hinder extrapolation or explainability of the model in broader operating domains.

Nonetheless, it is important to recognize that variable selection is inherently context-dependent. Current selection is optimized for the specific objective of training a robust, interpretable ANFIS model for this dataset and operating window. Future studies may consider evaluating alternative or expanded subsets of input variables—such as combinations including permeate mix viscosity or permeate fraction CH₄—to explore whether these configurations offer improved performance, particularly under varying process conditions, multicomponent mixtures, or in the presence of dynamic disturbances.

In summary, the selected reduced-input model demonstrates excellent agreement with observed data (see Table 9), both statistically and physically. This outcome provides a strong foundation for training intelligent models like ANFIS, while avoiding pitfalls of overfitting, multicollinearity, and loss of physical meaning.

Table 9. Criteria and Rationale for Selecting Input Variables for ANFIS Modeling.

3.3. ANFIS Simulation Results

The dataset used for training the ANFIS model is characterized by its high complexity and variability, justifying the choice of an advanced modeling strategy.

Figure 12 presents the output variable used as the target for ANFIS training:

{\hat{v}}_{{C H}_{4}}^{R}

for each sample. Much like several of the input variables, the target output displays a highly scattered profile, characterized by frequent peaks and valleys and a lack of clear global trends across the dataset. Distribution is fairly symmetric, with values spanning a broad range; notably, there are no obvious outliers or missing data, though some points fall close to zero—most likely a result of diversity in simulated operating conditions or presence of measurement noise.

Figure 12. Experimental values of retentate

{\hat{v}}_{{C H}_{4}}^{R e}

(

\hat{y}

) for all samples in the dataset. This variable is used as the output (target) in the ANFIS modeling.

This considerable variability in output further increases the challenge for any predictive model, as it must learn to map a set of complexes, nonlinear, and potentially noisy relationships between inputs and target values. When combined with highly variable input variables (as shown in Figure 13), this scattered output underscores the inherent complexity of the modeled gas separation process. Altogether, these figures highlight the necessity for advanced modeling strategies such as ANFIS, which are specifically designed to capture nonlinear dependencies and to provide robust predictions even in the presence of high data dispersion and noise—scenarios where traditional modeling approaches often struggle.

Figure 13. Input variables used for the ANFIS model: (a)

{P m}_{{C H}_{4}}

(

x_{1}

); (b)

V^{R e}

(

x_{2}

); (c)

P^{R e}

(

x_{3}

); (d)

μ_{m i x}^{R e}

(

x_{4}

). Each plot shows the respective variable as a function of sample number for the entire dataset.

Figure 13 provides a comprehensive visualization input variables selected for the ANFIS model: (a)

{P m}_{{C H}_{4}}

, (b)

V^{R e}

, (c)

P^{R e}

, and (d)

μ_{m i x}^{R e}

, each plotted as a function of sample number. This figure serves to illustrate both the dynamic range and variability present in the input dataset used for training and evaluating the ANFIS model.

The input variables depicted in Figure 13 show a combination of highly variable and relatively stable features, reflecting the multifaceted and sometimes unpredictable nature of real-world process data. This diversity underscores the need for robust and flexible modeling strategies—such as ANFIS—that are capable of learning complex, nonlinear, and context-dependent relationships in the presence of noisy or nonstationary inputs.

As shown, the majority of these variables exhibit marked dispersion and apparently random fluctuations throughout the entire sample set, without any discernible temporal trends, periodicities, or seasonality. This high degree of scatter highlights the complexity and nonstationary character of the system, emphasizing the modeling challenge posed by such input data.

Focusing on permeance and retentate volume, both variables display substantial variability and a wide dynamic range. These characteristics may reflect either a broad spectrum of operational scenarios—including changes in membrane properties, feed conditions, or process upsets—or the synthetic or experimental nature of the underlying dataset. The absence of smooth transitions or recurring patterns suggests that the data either samples a wide range of experimental data or intentionally incorporates variability to enhance the generality of the resulting model.

In sharp contrast, retentate pressure remains almost invariant across all samples, presenting only minor fluctuations around a stable mean value. This pattern is typical of processes where pressure is tightly controlled or regulated, ensuring that this variable does not introduce confounding effects in the modeling process. Such stability is often desirable in membrane systems to maintain separation efficiency and process safety.

Finally, retentate mix viscosity displays behavior that is generally stable, with values clustered around a central mean. Nonetheless, several isolated peaks are observed, which could be attributed to specific simulation scenarios, rare measurement anomalies, or occasional transient events in the process. These excursions, though infrequent, might provide a model with valuable information about edge-case behavior or process sensitivity.

Table 10 presents a comparative analysis considering different metrics using several

μ_{M F}

, such as gaussian, sigmoidal, bell, and triangular. Likewise, datasets used are separated into 700 samples for training and 300 samples for validation. Full analysis can be consulted in Appendix A, in Table A4, Table A5 and Table A6. The results obtained show that sigmoidal

μ_{M F}

has better performance. Furthermore, sigmoidal

μ_{M F}

has only two parameters, which allows for a reduction in evaluation time.

Table 10. Comparative Analysis of Membership Function Performance in an ANFIS Model.

$μ_{M F}$	Delay	Eval Time (s)	MSE_Train	RMSE_Train	MAE_Train	R²_Train	MSE_Test	RMSE_Test	MAE_Test	R²_Test
Gaussian	5	3.1722	7.4569 × 10⁻¹⁵	8.6354 × 10⁻⁸	4.3155 × 10⁻⁸	0.9998	3.4332 × 10⁻¹⁵	5.8594 × 10⁻⁸	3.6890 × 10⁻⁸	0.9999
Sigmoidal	4	0.0111	6.50 × 10⁻¹⁵	8.06 × 10⁻⁸	4.07 × 10⁻⁸	0.9998	5.83 × 10⁻¹⁵	7.64 × 10⁻⁸	4.39 × 10⁻⁸	0.9999
Bell	6	4.2737	7.2937 × 10⁻¹⁵	8.5403 × 10⁻⁸	4.4189 × 10⁻⁸	0.9998	3.8136 × 10⁻¹⁵	6.1755 × 10⁻⁸	3.9547 × 10⁻⁸	0.9999
Triangular	3	4.2068	5.4029 × 10⁻¹¹	7.3505 × 10⁻⁶	3.328 × 10⁻⁶	−0.258	5.9828 × 10⁻¹¹	7.7348 × 10⁻⁶	3.5508 × 10⁻⁶	−0.267

Once sigmoidal

μ_{M F}

is selected, its graphical results are illustrated. Figure 14a presents a comparison between the experimental values and the predictions generated by the best ANFIS model (with

d

= 4) for the

{\hat{v}}_{{C H}_{4}}^{R e}

across all samples. The used dataset is clearly divided by a vertical dashed line into two partitions: the initial 70% of samples are assigned to training, while the remaining 30% are reserved for testing and model validation. In both regions, model predictions (dashed lines) exhibit a remarkable alignment with actual experimental data (solid lines), capturing with high accuracy not only the general trends but also the intricate details—such as sharp peaks and deep valleys—that characterize the output signal.

Figure 14. (a) Comparison between experimental (Real Train: solid blue line; Real Test: solid orange line) and predicted values (Predicted Train: dashed blue line; Predicted Test: dashed orange line) of

{\hat{v}}_{{C H}_{4}}^{R e}

obtained by the best ANFIS model (

d

= 4). The model is trained using 70% of the data and tested on the remaining 30%. Vertical dashed line indicates train/test split. (b) Residuals (prediction errors) for test set using best ANFIS model (

d

= 4).

A closer inspection reveals that the ANFIS model preserves this high fidelity even in segments where the process response is particularly volatile or shows abrupt fluctuations. This is especially noteworthy, as such regions typically pose significant challenges for both conventional and data-driven modeling approaches, often resulting in loss of precision or the introduction of systematic errors. Model’s consistent performance throughout both smooth and highly variable intervals suggests that it is not simply memorizing data (overfitting) but rather learning and internalizing the underlying process dynamics.

Moreover, near-perfect overlap between real and predicted curves in both training and testing subsets underscores the model’s ability to generalize to previously unseen data, which is critical for practical applications in real-world systems. This level of predictive accuracy, even under challenging conditions, highlights the effectiveness of the ANFIS approach when dealing with complex systems where interactions among variables are nonlinear, time-dependent, and potentially obscured by noise.

To complement this visual assessment, Figure 14b provides an in-depth analysis of the model’s predictive accuracy by displaying the residuals— pointwise errors between experimental and predicted values—over the entire test set. The vast majority of these residuals remain tightly clustered around zero, with only isolated instances where more substantial errors occur. Lack of persistent trends, recurring spikes, or clusters of large errors throughout the sample range indicates that the model does not introduce bias, nor does it fail systematically under any particular set of operating conditions. Such a balanced distribution of errors is a desirable attribute, as it suggests that the model’s occasional larger deviations are more likely attributable to inherent data noise or rare process events rather than to structural deficiencies in the model itself.

Another important aspect is that, despite the high degree of scatter and the presence of noise in both input and output variables (as illustrated in Figure 12 and Figure 13), the ANFIS model maintains robustness and does not suffer from degradation in predictive quality as data complexity increases. This robustness is particularly valuable in industrial settings, where sensor noise, process disturbances, and changes in operating regimes are common and can easily compromise the performance of less adaptive models.

Across all tested

d

(see Table 11), ANFIS models achieved outstanding accuracy, with R² values of 0.999 for both subsets and extremely low error metrics (on the order of 10⁻¹⁵ for MSE and less than 10⁻⁷ for RMSE). These results reflect both the suitability of selected variables and the ability of ANFIS to model complex, nonlinear processes.

Table 11. Performance metrics for ANFIS models at different times,

d

. Metrics Eval Time, MSE, RMSE, MAE, R2 for both the training and test datasets. Best result is highlighted in bold.

As

d

increases from 1 to 20, only minor variations are observed in the error metrics. MSE and MAE remain low and relatively stable for

d

between 1 and 9, with a gradual increase in error from

d

= 10 onward. Best predictive performance in the test set is obtained with a

d

of 4, where the lowest MSE and MAE values are recorded. This indicates that incorporating a moderate memory of previous samples enables the model to optimally capture temporal dependencies without introducing unnecessary complexity or overfitting.

To rigorously evaluate the model’s generalization capability, a cross-validation was performed, with the results summarized in Table 12. This analysis confirms the exceptional predictive performance and stability of the ANFIS architecture.

Table 12. Performance metrics for ANFIS models with different time

d

using cross-validation. Metrics Eval Time, MSE, RMSE, MAE, R² for both the training and test datasets. The best result is highlighted in bold.

The results demonstrate outstanding and consistent accuracy for models with short time lags. For delays between

d

= 1 and

d

= 10, the models achieved average R² values of 0.9998 and Mean Square Errors (MSE) on the order of 10−15, indicating high reliability regardless of the data partition.

However, a drastic degradation in performance is observed for lags greater than 10 (

d

> 10), where the R² decreases and errors increase by several orders of magnitude. This behavior suggests that an excess of historical information introduces noise, causing model instability and a loss of its generalization capability.

Based on this analysis, the model with a time lag of

d

= 3 is identified as the optimal configuration. It offers the best balance between near-perfect accuracy, demonstrated by the lowest average MSE (7.256 × 10⁻¹⁵) and computational efficiency, validating its robustness for real-time prediction and control applications.

The architecture of the optimal ANFIS model, corresponding to a time lag of

d

= 3, is illustrated in Figure 15. This structure is notable for its parsimony; although neuro-fuzzy systems can house significant complexity, the preceding statistical analysis demonstrated that a model with a single fuzzy rule (Equation (38)) was sufficient to capture the dynamic process with exceptional precision. The rule follows a first-order Sugeno structure, where if the four input variables (Permeance, Retentate Volume, Retentate Pressure, and Retentate Mix Viscosity) meet their respective membership functions, then the output (methane volume in the retentate) is calculated as a linear combination of those inputs. The fact that such a simple architecture achieves such high performance underscores the effectiveness of the variable selection and the synergy between the mechanistic model and the neuro-fuzzy approach.

I F x_{1} i s {M F}_{1} A N D x_{2} i s {M F}_{2} A N D x_{3} i s {M F}_{3} A N D x_{4} i s {M F}_{4} T H E N y = b_{1} x_{1} + b_{2} x_{2} + b_{3} x_{3} + b_{4} x_{4} + c_{1}

(38)

Figure 15. Architecture of the single-rule ANFIS model for the optimal time lag (d = 3). This title specifies that the diagram represents the structure of the best-performing model identified through cross-validation, which uses a single fuzzy rule and a time delay of three samples.

To conclusively validate the robustness of this optimal model, Table 13 presents a detailed breakdown of the performance metrics for each of the five folds of the cross-validation. The results demonstrate remarkable stability and consistency. Across all folds, the coefficient of determination (R²) remains above 0.999, while the Mean Squared Error (MSE) and Mean Absolute Error (MAE) stay at extremely low orders of magnitude (10⁻¹⁵ and 10⁻⁸, respectively). The minimal deviation in these metrics across the different test subsets confirms that the model’s superior performance is not an artifact of a fortunate data partition but the result of a genuine generalization capability.

Table 13. Cross-validation performance metrics by fold for the optimal ANFIS model (

d

= 3).

Figure 16a illustrates the evolution of the two main learning algorithm parameters: the step size (k), which is updated using heuristic rules proposed by Jang in [41], and the learning rate (η). Unlike an approach with fixed parameters, a dynamic behavior is observed. The learning rate (η), associated with the gradient descent method for nonlinear parameters, exhibits significant fluctuations throughout the 50 epochs. This indicates that the algorithm actively adjusts the magnitude of the updates to efficiently navigate the error surface. Simultaneously, the step size (k), associated with the adjustment of the linear parameters, decreases in a stepwise fashion, suggesting progressive stabilization as the model approaches the optimal solution.

Figure 16. (a) Evolution of the adaptive step size (k) and learning rate (η) across 50 training epochs. (b) Squared error per epoch, illustrating the model’s convergence behavior.

Meanwhile, Figure 16b shows the evolution of the squared error per epoch. Notably, the error converges almost instantaneously to an extremely low error valley (on the order of 10⁻¹²) without showing a typical steep descent curve. The low-amplitude, high-frequency fluctuations observed from that point onward do not indicate instability but are a direct reflection of the micro-adjustments made by the adaptive learning rate shown in Figure 16a. This behavior demonstrates that after a very rapid initial convergence, the algorithm continues a fine-tuning process to ensure the model settles in a deep and stable error minimum.

Taken together, both graphs confirm the efficiency and sophistication of the hybrid training method. The adaptive mechanism allows for swift convergence and precise parameter adjustment, which justifies the superior performance and robustness of the final selected model.

Figure 17 provides a compelling visual validation of the ANFIS model’s performance. It displays a near-perfect superposition of the actual methane volume in the retentate (solid blue line) and the values predicted by the model (dashed yellow line). This exceptional fit is maintained across the entire dataset, capturing with high fidelity not only the general trends but also the abrupt fluctuations and volatility of the process.

Figure 17. Validation of the Optimal ANFIS Model’s Predictive Performance. The plot shows the superposition of the actual (solid blue) and model-predicted (dashed yellow) values for the methane volume

{\hat{v}}_{{C H}_{4}}^{R e}

in the retentate across the entire dataset, demonstrating a near-perfect fit.

The model’s ability to precisely track the sharp peaks and valleys demonstrates that it has successfully learned the complex, nonlinear dynamics of the membrane system. This excellent visual agreement is the graphical representation of the quantitative metrics previously reported, such as a coefficient of determination (R²) close to 0.999. In summary, the plot confirms the model’s robustness and high precision.

The viability of a model for real-time applications depends not only on its accuracy but also on its computational efficiency. Therefore, the inference latency of the optimal ANFIS model was evaluated, and the results are presented in Table 14. This analysis quantifies the time required by the model to generate predictions as a function of the number of samples processed simultaneously (in a batch).

Table 14. Inference Time Analysis for the Optimal ANFIS Model.

For a single sample, the model demonstrates exceptional responsiveness, generating a prediction in just 0.3930 ms. This speed is fundamental for control systems that require immediate decisions. A key finding is the notable gain in efficiency achieved through batch processing; as the number of samples increases, the average inference time per sample decreases drastically. Specifically, when processing a batch of 100 samples, the average time per prediction is reduced to 0.0085 ms, representing an improvement of over 97% compared to individual processing.

It is observed that performance stabilizes for batches of 100 and 500 samples, indicating that the model’s maximum computational efficiency has been reached under these conditions. In addition to speed, the stability of the performance is notable, as the standard deviation significantly decreases for larger batches, which confirms that the prediction times are highly consistent and reliable. Taken together, these results confirm that the ANFIS model is computationally efficient, validating its implementation in real-time monitoring and control applications where response speed is a critical factor.

4. Conclusions

In this work, it has been demonstrated that combining a first-principles transport model with an ANFIS produces a tool that is both powerful and interpretable for modeling CH₄/CO₂ separation in HFMM. By systematically reducing eighteen candidate predictors to only four—permeance, retentate volume, retentate pressure, and mixture viscosity of the retentate—via Pearson correlation, VIF (with ridge regularization), and normalized regression diagnostics, it is also ensured a parsimonious input set that balances physical insight and statistical rigor. This hybrid framework not only preserves mechanistic transparency but also injects flexibility of data-driven learning, so that the “fuzzy” part of ANFIS remains strictly in the logic layer—never clouding our understanding of underlying physics!

While an initial analysis training on 70% of the simulated dataset and validating on the remaining 30% suggested the best ANFIS configuration was the one with a time lag of

d

= 4, a more robust cross-validation was performed to get a more reliable estimate of the model’s generalization capability. This more rigorous method revealed that the model with a time lag of

d

= 3 was the true optimum, and this final configuration achieved an R² of 0.999 and an RMSE below 8 × 10⁻⁸ m³/h for retentate methane volume prediction. Such near-perfect agreement, across both smooth regimes and abrupt dynamics, confirms that the model is anything but “fuzzy” in its accuracy.

Validity is restricted by the assumptions and the training domain. For validation, permeances were assumed constant; strong dependencies on T, P, or composition (e.g., competitive sorption, plasticization, aging/moisture) or module non-idealities (maldistribution, external film resistances) not explicitly parameterized can degrade accuracy. The hydrodynamics are 1D and laminar (Hagen–Poiseuille) with non-deformable fibers; deviations are expected under transitional/turbulent flow, slip/Knudsen regimes at very low pressure, or fiber compaction at high pressure drop. PR-EOS and Wilke/Chapman–Enskog viscosity models are adequate across broad ranges but may require corrections near critical regions, under incipient condensation, or for strongly associating/heavy/impure mixtures (e.g., C3+, H₂S, high humidity). The ANFIS was trained on physics-generated data. While performance was initially evaluated with a 70/30 train/test split, a cross-validation was also implemented to ensure a more reliable estimate of the model’s generalization capability. To this end, future work will focus on the development of a pilot-scale plant to obtain our own experimental data. This will allow for a direct validation of the model’s generalization to real-world conditions and enable further calibration of the intelligent system, bridging the gap between simulation and industrial application.

From an applied standpoint, this hybrid approach fully aligns with Industry 4.0 objectives: it reduces parameter uncertainty, adapts in real time to process disturbances, and serves as an interpretable “soft sensor” for advanced monitoring and control.

In this context, the interpretability of the ANFIS component provides actionable insight for process engineers in two key ways. First, as a decision-support tool, the ANFIS model distills the complex set of differential equations from the mechanistic model into a single, intuitive “IF-THEN” fuzzy rule. This rule acts as a transparent soft sensor, allowing an operator without modeling expertise to gain a qualitative, cause-and-effect understanding of how input variables (such as permeance or retentate pressure) influence the final methane volume, thereby guiding operational adjustments. Second, its simplicity and high computational speed make it the foundation for advanced process optimization. The model is designed to be integrated into an optimization loop coupled with bio-inspired algorithms, using its predictions within a cost function to determine the optimal operating conditions that maximize methane purity or recovery.

That a single membership function per input sufficed to capture complex nonlinearities underscores the wisdom of marrying white-box equations with neuro-fuzzy learning—delivering high precision without unnecessary model bloat. Moreover, this white-box model can be substituted by a data-collection system in physical plants, enabling empirical calibration and adjustment of the Intelligent Model under real operating conditions.

Looking ahead, integrating this ANFIS-enhanced model into multi-objective optimization routines or closed-loop control schemes promises to further boost the efficiency and resilience of gas separation processes. Specifically, and as a key priority for our next investigation, we will conduct the sensitivity analysis on fiber compaction. This analysis will quantify how potential reductions in fiber diameter impact pressure drop and CH₄ recovery, thereby directly addressing the model’s utility for industrial scale-up. Future work could also explore online parameter adaptation under real plant noise, extension to multicomponent mixtures beyond CH₄/CO₂, and pilot-scale validation.

Author Contributions

Conceptualization, B.J.G.-S. and J.A.R.-H. methodology, J.A.R.-H. and B.J.G.-S.; software, B.J.G.-S.; validation, B.J.G.-S. and J.A.R.-H.; writing—original draft preparation, B.J.G.-S. and J.A.R.-H.; writing—review and editing, J.A.R.-H., A.Y.A., M.A.R.C., J.C.G.G., J.-L.R.-L. and F.J.R.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

First author acknowledges SECIHTY (formerly CONHACYT) with scholarship number [999970] for its support through a Ph.D. scholarship. The authors also thank the support to UNACAR and U. de G. to establish the academic collaboration network “Sistemas Avanzados, Inteligentes y Bioinspirados Aplicados a la Ingeniería, Tecnología y Control” for providing the facilities, resources, and collaborative environment to carry out this research project.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

AARD	Average Absolute Relative Deviation
ACO	Ant Colony Optimization
AI	Artificial Intelligence
ANFIS	Adaptive Neuro-Fuzzy Inference System
ANN/ANNs	Artificial Neural Network(s)
ARD	Average Relative Deviation
BDF	Backward Differentiation Formula(s)
BR	Bayesian Regularization
CFD	Computational Fluid Dynamics
CSA-LSSVM	Crow Search Algorithm–Least-Squares Support Vector Machine
Cu-BTC	Copper(II) benzene-1,3,5-tricarboxylate (HKUST-1)
DE	Differential Evolution
dsigmf	Difference-of-Sigmoids Membership Function
EUF	Electro-Ultrafiltration (Electro-assisted Ultrafiltration)
FCM	Fuzzy C-Means
FFBP-LM	Feed-Forward Backpropagation with Levenberg–Marquardt
FO	Forward Osmosis
FS	Fumed Silica
GA	Genetic Algorithm
gaussmf	Gaussian membership function
GMDH	Group Method of Data Handling
GP	Genetic Programming
GPU	Gas Permeation Unit
GWO	Gray Wolf Optimizer
HF	Hollow Fiber
HFMM	Hollow Fiber Membrane Module(s)
Jw	Water flux
Js	Reverse salt flux
LM	Levenberg–Marquardt (algorithm)
MAE	Mean Absolute Error
MAD	Mean Absolute Deviation
MD	Molecular dynamic
MF/MFs	Membership Function(s)
MMM	Mixed-Matrix Membrane
MLP	Multilayer Perceptron
MOFs	Metal–Organic Frameworks
MSE	Mean Square Error
NSE	Nash–Sutcliffe Efficiency
NP/NPs	Nanoparticle(s)
PBCC5	Polyethersulfone, 5 kDa
PBSA	Poly(butylene succinate-co-adipate)
PBS	Poly(butylene succinate)
PDMS	Polydimethylsiloxane
PI	Performance Index
PL/PLs	Porous Liquid(s)
PLCC5	Regenerated cellulose, 5 kDa
PMP	Polymethylpentene
PNN	Probabilistic Neural Network
POSS	Polyhedral Oligomeric Silsesquioxane
PR	Peng–Robinson
PS	Polystyrene
PSO	Particle Swarm Optimization
PVAc	Poly(vinyl acetate)
RF	Random Forest
RMSE	Root Mean Square Error
RSM	Response Surface Methodology
SAPO-34	Silicoaluminophosphate-34
SC	Subtractive Clustering
SGD	Stochastic Gradient Descent
TFC	Thin-Film Composite
TSK	Takagi–Sugeno–Kang (Sugeno fuzzy model)
VIF	Variance Inflation Factor

Nomenclature

Latin letters
$a_{i}$	PR parameter for pure component i, $atm \cdot m^{6} \cdot {mol}^{- 2}$
$a_{M F}$	MF slope/shape parameter, $(-)$
$a_{m i x}$	PR mixture parameter, $atm \cdot m^{6} \cdot {mol}^{- 2}$
$b_{i}$	PR parameter for pure component i, $m^{3} \cdot {mol}^{- 1}$
$b_{m i x}$	PR mixture parameter, $m^{3} \cdot {mol}^{- 1}$
$b_{r m}$	Linear consequent coefficient (rule $r$ , input $m$ ), $(unit of y per unit of x_{m})$
$c_{M F}$	MF center, $(-)$
$c_{r}$	Consequent offset (rule $r$ ), $(Units of y)$
$d$	Time lag (delay), $(-)$
$D$	Module (shell) diameter, $m$
$D_{i}$	Inner fiber diameter, $m$
$D_{o}$	Outer fiber diameter, $m$
$E$	Mean squared error in ANFIS training, ${(unit of y)}^{2}$
$F_{i}$	Molar flow rate of component i, $mol \cdot s^{- 1}$
$f_{r}$	Output of rule $i$ , $(unit of y)$
$I_{k_{0}}$	Identity matrix used in ridge-VIF, $(k_{0} \times k_{0})$
$J (β)$	Regression MSE cost function, ${(unit of y)}^{2}$
$k$	Number of selected predictors, $(-)$
$k_{0}$	Initial number of candidate predictors (18), $(-)$
$k_{B}$	Boltzmann constant, $J \cdot K^{- 1}$
$k_{i}$	Empirical parameter in $α_{i}$ , $(-)$
$k_{i j}$	Binary interaction parameter, $(-)$
$L$	Module length, $m$
$M$	Number of fuzzy rules in ANFIS, $(-)$
$M_{i}$	Molar mass of component i, $g \cdot {mol}^{- 1}$
$n$	Number of observations (samples), $(-)$
$N$	Number of Hollow Fiber, $(-)$
$N_{s}$	Number of samples (for MSE in ANFIS), $(-)$
$P$	Pressure on the retentate side, $atm$
$P_{c, i}$	Critical pressure of component i, $atm$
$p$	Pressure on the permeate side, $atm$
${P m}_{i}$	Permeance of component i, $mol \cdot m^{- 2} \cdot {atm}^{- 1} \cdot s^{- 1}$
$Q$	Number of ANFIS inputs, $(-)$
$R$	Universal gas constant, $J \cdot {mol}^{- 1} \cdot K^{- 1}$
$R^{2}$	Coefficient of determination, $(-)$
$R_{x x}$	Predictor correlation matrix, $(-)$
$t$	Time/sample index in ANFIS, $(-)$
$T$	Temperature, $K$
$T_{c, i}$	Critical temperature of component i, $K$
$T^{*}$	Reduced temperature, $(-)$
$u_{i}$	Flow rate on the permeate side of species $i$ , $mol \cdot s^{- 1}$
$v_{i}$	Flow rate on the retentate side of species $i$ , $mol \cdot s^{- 1}$
$V$	Total Volumetric flow rate, $m^{3} \cdot h^{- 1}$
$V_{i}$	Volumetric flow rate of component $i$ , $m^{3} \cdot h^{- 1}$
$V_{m}$	Molar volume, $m^{3} \cdot {mol}^{- 1}$
$w_{r}$	Rule firing strength, $(-)$
${\bar{w}}_{r}$	Normalized rule firing strength, $(-)$
$x_{m}$	ANFIS input $m$ , $(-)$
$x_{s, q}$	Value of predictor $q$ at observation $s$ , $(-)$
$x_{s, q}^{(n)}$	Normalized predictors,
$y$	ANFIS aggregated output, $(-)$
$y^{(n)}$	Normalized response, $(-)$
$\hat{y}$	Predicted output, $(-)$
${\hat{y}}^{(n)}$	Predicted output normalized, $(-)$
$z$	Axial coordinate along the module Length, $m$
Greek symbols
$α_{i} (T)$	Temperature dependence in Peng-Robinson equation, $(-)$
$β_{0}$	Intercept, $(units of y)$
$β_{q}$	Regression coefficients $(units of y per units of x_{s, q})$
$β$	Coefficient vector, $(units of y per units of x_{s, q})$
$ε$	Depth of the potential well in the Lennard–Jones potential, $K$
$χ_{i}$	Mole fraction of component $i$ , $(-)$
$η$	Learning rate in ANFIS gradient descent, $(-)$
$λ$	Ridge penalty in ridge-VIF, $(-)$
$μ_{i}$	Pure-gas viscosity of component $i$ , Pa·s
$μ_{m i x}$	Mixture dynamic viscosity, Pa·s
$μ_{x}$	Column-wise mean vector of X, $(-)$
$μ_{y}$	Mean scalar of response y, $(-)$
$μ_{M F}$	Membership function, $(-)$
$σ$	Lennard–Jones collision diameter, $m$
$σ_{x}$	Column-wise std. dev. vector of X, $(-)$
$σ_{y}$	Std. dev. scalar of response y, $(-)$
$ρ_{q l}$	Pearson correlation between predictors $q$ and $l$ , $(-)$
$φ_{i j}$	Binary weighting factor, $(-)$
$Ω_{μ}$	Collision integral, $(-)$
$ω_{i}$	Acentric factor of component $i$ , $(-)$
Subscripts and Superscripts
$R_{e}$	Refers to the retentate stream (non-permeated)
$P_{e}$	Refers to the permeate stream (permeated)
$i$	Refers to a specific chemical component (e.g., CO₂, CH₄)
$m i x$	Refers to a property of the gas mixture
$v a r$	Refers to a variable parameter in the simulation

Appendix A

For full reproducibility, Appendix A reports the CH₄/CO₂ parameters used in this work (see Table A1, Table A2 and Table A3). While widely available in the classical literature, they are restated here to avoid ambiguity; we follow Bird, Stewart, and Lightfoot [43].

Table A1. Molecular weights.

Species	$M_{i}$
CH₄	16.04
CO₂	44.01

Table A2. Critical properties and acentric factor.

Species	$T_{c, i}$	$P_{c, i}$	$ω_{i}$
CH₄	190.7	45.8	0.011
CO₂	304.2	72.9	0.225

Table A3. Lennard–Jones parameters.

Species	$σ$	$\frac{ε}{k_{B}}$
CH₄	3.882	137
CO₂	3.996	190

Algorithm A1 Gas Separation Membrane Simulation Process and Validation. Pseudocode for the simulation and validation of the mechanistic membrane model.

// PHASE 1: SETUP AND INITIALIZATION
1  DEFINE FixedSystemParameters:
2 - Membrane module geometry (Length, Diameters, Number of Fibers)
3 - Operating conditions (Temperature, Inlet Pressures)
4 - Physical and gas constants (R, Molar Masses, etc.)
5  DEFINE MembraneProperties:
6 - Permeance coefficients (Pi) for each gas (CO₂, CH₄)
7  DEFINE StudyCases:
8 - Load list of feed flow rates to simulate
9  LOAD ReferenceData:
10 - Load experimental results (purity, recovery, etc.)
11 - Load results from a comparison model (“Ko Model”)
12 DEFINE
13 Solver Type: ode15s (variable-step method for stiff systems).
14 Relative Tolerance (RelTol): 1 × 10⁻⁶.
15 Absolute Tolerance (AbsTol): 1 × 10⁻¹²
16 Integration Interval: [0, L] (along the axial axis of the module).

17 INITIALIZE an empty data structure to store the simulation results.

// PHASE 2: MAIN SIMULATION LOOP
18 FOR EACH feed_flow_rate IN the list of StudyCases:
19 // a. Prepare initial conditions for the solver
20 CALCULATE the total molar inlet flow based on the feed_flow_rate.
21 CONSTRUCT the initial state vector Q0 at the module inlet (z = 0).

22 // b. Solve the mathematical model along the module length
23 CALL the Ordinary Differential Equation (ODE) Solver with: massPressureModel, range [0, L], and Q0.
24 GET the final state vector Qf at the module outlet.

25 // c. Post-process and store the results for this simulation
26 CALCULATE performance metrics (Purity, Recovery, etc.) from Qf.
27 STORE these calculated metrics in the results structure.
28 END FOR

// PHASE 3: RESULTS ANALYSIS AND VISUALIZATION
29 CALCULATE errors (Relative, MAE, and RMSE) by comparing simulation vs. experimental data.
30 GENERATE Performance Plots (Purity and Recovery vs. Flow Rate).
31 GENERATE Error Plots (Bar charts for comparison).
32 END

Algorithm A2 Generation of Synthetic Data via Parametric Exploration. Pseudocode for the generation of the synthetic dataset via parametric exploration.

// PHASE 1: SETUP AND PARAMETER DEFINITION

1 DEFINE FixedSystemParameters (T, Geometry, Pressures, n_total, etc.).
2 DEFINE SpeciesProperties (Molar fractions, Molar masses M, Pmi_base, etc.).
3 DEFINE RandomSimulationParameters:
4 - Index of the component to vary (var_index = CH₄).
5 - Number of simulations (num_segments).
6 DEFINE
7 Solver Type: ode15s (variable-step method for stiff systems).
8 Relative Tolerance (RelTol): 1 × 10⁻⁶.
9 Absolute Tolerance (AbsTol): 1 × 10⁻¹²

10 CALCULATE lower and upper limits for the variable permeance (lim_inf_pmi, lim_sup_pmi).
11 GENERATE a vector Pmi_range with num_segments random permeance values within the limits.

// PHASE 2: MAIN SIMULATION LOOP
12 INITIALIZE a results matrix ZF (num_segments x 18) with NaN values.
13 FOR EACH Pmi_varied IN the Pmi_range vector:
14 INSIDE a Try-Catch block to handle simulation errors:
15 // a. Configure conditions for the current simulation
16 ASSIGN the Pmi_varied to the permeance vector Pmi.
17 CONSTRUCT the initial state vector Q0 (flows and pressures at z = 0).
18 GROUP all parameters into a ‘sim’ structure for the solver.

20 // b. Solve the mathematical model
21 CALL the ODE Solver (ode15s) with the ‘modeloMasaPresion’ model, Q0, and ‘sim’.
22 GET the solution matrix Qsol.

24 // c. Post-process and calculate performance metrics
25 EXTRACT the final values of flows (FiR, FiP) and pressures from Qsol.
26 CALCULATE the 18 output metrics (volumes, fractions, viscosities, etc.).
27 STORE the calculated metrics as a new row in the ZF matrix.
28 END FOR
29 FILTER ZF to remove any rows containing errors (NaNs).

// PHASE 3: STRUCTURING THE FINAL DATASET
30 CONVERT the numerical matrix ZF to a structured table (tabla_ZF) using predefined headers.
31 RETURN tabla_ZF as the synthetic dataset for statistical analysis and ANFIS model training.
32 END

Algorithm A3 Statistical Analysis and Multivariate Regression Diagnostics. Pseudocode for the statistical analysis and variable selection workflow.

// PHASE 1: SETUP AND PRELIMINARY DIAGNOSTICS
1 LOAD the dataset (tabla_ZF).
2 PROMPT user to select the dependent variable (Y) and independent variables (X).
3 FOR EACH variable in X:
4 CALCULATE and DISPLAY descriptive statistics (standard deviation, min/max).
5 END FOR
6 CALCULATE and DISPLAY the rank of matrix X for an initial multicollinearity check.
  // PHASE 2: FITTING REGRESSION MODELS
7 // -- Multiple Linear Regression Model --
8 DEFINE GradientDescentHyperparameters:
9 Learning Rate (alpha): 0.01
10   Number of iterations: 1000
11 NORMALIZE the X and Y matrices.
12 FIT the model coefficients (theta) using Gradient Descent on the normalized data.
13 DENORMALIZE the predictions (Y_pred) back to the original scale.
14 CALCULATE the model's global performance metrics (R², MAE, MAPE).
15 // -- Simple Linear Regression Models --
16 FOR EACH variable in X:
17   FIT a simple regression model against Y.
18   CALCULATE and STORE its individual performance metrics (R², MAE, MAPE).
19 END FOR
  // PHASE 3: MULTICOLLINEARITY ANALYSIS
20 CALCULATE the Pearson correlation matrix (Rxx) for the variables in X.
21 FOR EACH variable in X:
22   CALCULATE the Variance Inflation Factor (VIF) using several methods (Classic and with Ridge regularization).
23   Ridge VIF: Using a fixed regularization parameter (lambda = 0.03).
24 END FOR
25 CALCULATE the Condition Index from the eigenvalues of Rxx.
26 GENERATE a summary table with the VIF and Condition Index results.
  // PHASE 4: GENERATION OF VISUAL DIAGNOSTICS AND REPORTS
27 CALCULATE the residuals of the multiple regression model (resid = Y - Y_pred).
28 GENERATE a Heatmap of the correlation matrix Rxx with labels and values.
29 GENERATE a Residuals vs. Predicted Values plot to check for homoscedasticity.
30 GENERATE a Q-Q plot of the residuals to check for normality.
31 GENERATE a histogram of the residual distribution.
32 GENERATE an Actual vs. Predicted Values plot to evaluate the overall fit.
33 GENERATE an Absolute Error per Sample plot to identify outliers.
34 GENERATE final summary tables with the model coefficients and performance metrics.
35 END

Algorithm A4 Optimal Lag Search for ANFIS Model with Fixed Split (Train/Test). Pseudocode for the optimal lag search and training of the ANFIS model.

  // PHASE 1: INITIAL SETUP
1   LOAD the pre-processed dataset.
2   DEFINE model hyperparameters (I/O indices, max_lags, M, mf_type, k0, max_iters).
3   INITIALIZE Random Number Generator (Seed = 45) for reproducibility.
4   INITIALIZE structures to store metrics, latencies, and parameters for each trained model.
  // PHASE 2: OPTIMAL LAG SEARCH
5   FOR EACH time lag (lag) FROM 1 TO max_lags:
6 // a. Data Preparation and Splitting
7 GENERATE the input-output dataset by applying the current lag.
8 SPLIT the data into a training set (70%) and a testing set (30%).
9 // b. Model Training
10 INITIALIZE the ANFIS model parameters (a, b, c).
11 FOR EACH epoch FROM 1 TO max_iters:
12 // Step 1: Forward Pass & Least Squares Estimate
13 CALCULATE the consequent parameters (params) using a Least Squares estimate (A\B).
14 // Step 2: Backward Pass & Gradient Descent
15 CALCULATE the error gradient with respect to the premise parameters.
16 UPDATE the premise parameters (a, b, c) using Gradient Descent.
17 // Step 3: Adaptive Learning Rate
18 ADJUST the learning rate (k) based on the error trend of the last 5 epochs (e.g., increase by 10% on steady decrease, decrease by 10%
on oscillation)squares and Gradient Descent with an adaptive learning rate).
19 END FOR
20 // c. Evaluation and Storage
21 MAKE predictions on both the training and testing sets.
22 CALCULATE performance metrics (MSE, RMSE, MAE, R²) for both sets.
23 STORE the calculated metrics for the current lag.
24 CALCULATE and STORE the model’s inference latency.
25 SAVE all trained model parameters and results (a, b, c, params, test data, predictions, etc.).
26 END FOR
// PHASE 3: RESULTS ANALYSIS AND REPORT GENERATION
27 IDENTIFY the best lag (best_idx) based on the lowest MSE on the test set.
28 // -- General and Best Model Reports --
29 GENERATE an Error vs. Latency plot to visualize the trade-off of all evaluated models.
30 GENERATE a summary table with the key performance of the best model (lag, MSE, latency).
31 GENERATE a general table with the performance metrics for all evaluated lags.
32 IDENTIFY the top N models (e.g., Top 7) based on the test set MSE.
33 FOR EACH of the top N models:
34 RETRIEVE its saved parameters and results.

Algorithm A5 ANFIS Model Training and Cross-Validation. Pseudocode for the ANFIS model training and k-fold cross-validation.

// PHASE 1: INITIAL SETUP
1   LOAD the pre-processed dataset.
2   DEFINE model hyperparameters (I/O indices, max_lags, M, mf_type, k_folds, etc.).
3   INITIALIZE structures to store average metrics and standard deviation.
4 INITIALIZE a container (all_lags_fold_metrics) to store the detailed metrics for EACH FOLD.

// PHASE 2: OPTIMAL LAG SEARCH VIA CROSS-VALIDATION
5   FOR EACH time lag (lag) FROM 1 TO max_lags:
6    // a. Data Preparation and Partitioning
7    GENERATE the input-output dataset by applying the current lag.
8    CREATE k_folds partitions of the data for cross-validation.

9    // b. Training and Evaluation of the current fold
10    FOR EACH partition (fold) FROM 1 TO k_folds:
11    SPLIT the data into training and testing sets.
12    TRAIN the ANFIS model using the training data.
13    EVALUATE the trained model and CALCULATE performance metrics (MSE, RMSE, MAE, R²).
14    CALCULATE the inference latency and STORE all metrics for this fold.
15    END FOR

16    // c. Consolidation of results for the current lag
17    CALCULATE the average and standard deviation of the metrics from all folds.
18    STORE the consolidated results (average and std. dev.) for the current lag.
19    STORE the complete matrix with the metrics from each fold in ‘all_lags_fold_metrics’.
20    END FOR

// PHASE 3: BEST MODEL IDENTIFICATION AND REPORTING
21 IDENTIFY the best lag (best_idx) based on the lowest average MSE on the test set.

22 // -- Generate summary table for the best model --
23 RETRIEVE the saved test metrics for each fold of the best lag (best_idx) from the container.
24 CALCULATE the average and standard deviation of these retrieved metrics.
25 GENERATE a detailed summary table with the metrics for each fold of the best model.

   // PHASE 4: REPORT GENERATION
26 GENERATE a summary table comparing the average performance of all evaluated lags.
27 GENERATE an Error vs. Latency plot to visualize the model trade-offs.
28 GENERATE diagnostic plots for the final model (Actual vs. Predicted, error curve, etc.).
29 END

Table A4. Performance metrics for ANFIS with Gaussian Membership function models with different time d using cross-validation. Metrics Eval Time, MSE, RMSE, MAE, R² for both the training and test datasets. Best result is highlighted in bold.

Delay	Eval Time(s)	MSE_Train	RMSE_Train	MAE_Train	R²_Train	MSE_Test	RMSE_Test	MAE_Test	R²_Test
1	1.4773	5.4903 × 10⁻¹¹	7.4097 × 10⁻⁶	3.4251 × 10⁻⁶	−0.2717	5.7991 × 10⁻¹¹	7.6152 × 10⁻⁶	3.3465 × 10⁻⁶	−0.2393
2	1.8107	5.1806 × 10⁻¹¹	7.1976 × 10⁻⁶	3.1886 × 10⁻⁶	−0.2442	6.5415 × 10⁻¹¹	8.0880 × 10⁻⁶	3.9106 × 10⁻⁶	−0.3051
3	2.2136	4.7748 × 10⁻¹⁵	6.9100 × 10⁻⁸	3.4422 × 10⁻⁸	0.9999	1.0148 × 10⁻¹⁴	1.0074 × 10⁻⁷	4.2528 × 10⁻⁸	0.9998
4	2.6798	5.9851 × 10⁻¹¹	7.7364 × 10⁻⁶	3.5949 × 10⁻⁶	−0.2754	4.6392 × 10⁻¹¹	6.8112 × 10⁻⁶	2.9390 × 10⁻⁶	−0.2288
5	3.1722	7.4569 × 10⁻¹⁵	8.6354 × 10⁻⁸	4.3155 × 10⁻⁸	0.9998	3.4332 × 10⁻¹⁵	5.8594 × 10⁻⁸	3.6890 × 10⁻⁸	0.9999
6	3.6361	5.7754 × 10⁻¹¹	7.5996 × 10⁻⁶	3.4671 × 10⁻⁶	−0.2629	5.0826 × 10⁻¹¹	7.1292 × 10⁻⁶	3.2077 × 10⁻⁶	−0.2538
7	4.0638	5.8503 × 10⁻¹¹	7.6487 × 10⁻⁶	3.5729 × 10⁻⁶	−0.2791	4.9241 × 10⁻¹¹	7.0172 × 10⁻⁶	2.9706 × 10⁻⁶	−0.2183
8	4.6857	7.2461 × 10⁻¹⁵	8.5124 × 10⁻⁸	4.4180 × 10⁻⁸	0.9998	4.1107 × 10⁻¹⁵	6.4115 × 10⁻⁸	3.8834 × 10⁻⁸	0.9999
9	5.0098	5.7740 × 10⁻¹⁵	7.5987 × 10⁻⁸	4.0684 × 10⁻⁸	0.9999	7.3984 × 10⁻¹⁵	8.6014 × 10⁻⁸	4.4977 × 10⁻⁸	0.9999
10	5.4081	5.5581 × 10⁻¹¹	7.4553 × 10⁻⁶	3.3961 × 10⁻⁶	−0.2618	5.6650 × 10⁻¹¹	7.5266 × 10⁻⁶	3.4191 × 10⁻⁶	−0.2600
11	6.4764	6.0751 × 10⁻¹⁵	7.7943 × 10⁻⁸	4.1210 × 10⁻⁸	0.9999	6.8112 × 10⁻¹⁵	8.2530 × 10⁻⁸	5.2199 × 10⁻⁸	0.9999
12	7.0233	5.9000 × 10⁻¹⁵	7.6811 × 10⁻⁸	4.1660 × 10⁻⁸	0.9999	6.9338 × 10⁻¹⁵	8.3270 × 10⁻⁸	4.1058 × 10⁻⁸	0.9998
13	7.1029	5.3830 × 10⁻¹¹	7.3369 × 10⁻⁶	3.2650 × 10⁻⁶	−0.2469	6.0972 × 10⁻¹¹	7.8085 × 10⁻⁶	3.7264 × 10⁻⁶	−0.2949
14	7.7078	5.0582 × 10⁻¹⁵	7.1121 × 10⁻⁸	3.9266 × 10⁻⁸	0.9999	1.1384 × 10⁻¹⁴	1.0670 × 10⁻⁷	4.8726 × 10⁻⁸	0.9997
15	7.9784	4.3128 × 10⁻¹⁵	6.5672 × 10⁻⁸	3.6525 × 10⁻⁸	0.9999	1.1275 × 10⁻¹⁴	1.0618 × 10⁻⁷	4.4277 × 10⁻⁸	0.9998
16	8.4660	5.8757 × 10⁻¹⁵	7.6653 × 10⁻⁸	4.3189 × 10⁻⁸	0.9999	6.9371 × 10⁻¹⁵	8.3289 × 10⁻⁸	4.7004 × 10⁻⁸	0.9998
17	8.8192	5.4818 × 10⁻¹⁵	7.4039 × 10⁻⁸	4.0819 × 10⁻⁸	0.9999	9.4202 × 10⁻¹⁵	9.7058 × 10⁻⁸	4.5190 × 10⁻⁸	0.9998
18	9.3072	6.5666 × 10⁻¹⁵	8.1035 × 10⁻⁸	4.6366 × 10⁻⁸	0.9999	5.1686 × 10⁻¹⁵	7.1893 × 10⁻⁸	4.5648 × 10⁻⁸	0.9999
19	9.7245	6.8630 × 10⁻¹⁵	8.2843 × 10⁻⁸	4.6651 × 10⁻⁸	0.9998	4.3345 × 10⁻¹⁵	6.5837 × 10⁻⁸	4.6216 × 10⁻⁸	0.9999
20	10.4467	6.5070 × 10⁻¹⁵	8.0666 × 10⁻⁸	4.5100 × 10⁻⁸	0.9999	5.1161 × 10⁻¹⁵	7.1527 × 10⁻⁸	4.3540 × 10⁻⁸	0.9999

Table A5. Performance metrics for ANFIS with Bell Membership function models with different time d using cross-validation. Metrics Eval Time, MSE, RMSE, MAE, R² for both the training and test datasets. Best result is highlighted in bold.

Delay	Eval Time(s)	MSE_Train	RMSE_Train	MAE_Train	R²_Train	MSE_Test	RMSE_Test	MAE_Test	R²_Test
1	1.5262	5.9250 × 10⁻¹⁵	7.6974 × 10⁻⁸	3.7942 × 10⁻⁸	0.9999	7.3363 × 10⁻¹⁵	8.5652 × 10⁻⁸	4.2986 × 10⁻⁸	0.9998
2	2.0329	6.5181 × 10⁻¹⁵	8.0735 × 10⁻⁸	4.0019 × 10⁻⁸	0.9999	5.7547 × 10⁻¹⁵	7.5860 × 10⁻⁸	3.8586 × 10⁻⁸	0.9999
3	2.6502	5.8358 × 10⁻¹¹	7.6393 × 10⁻⁶	3.5393 × 10⁻⁶	−0.2733	4.9722 × 10⁻¹¹	7.0514 × 10⁻⁶	3.0590 × 10⁻⁶	−0.2318
4	3.1597	1.2387 × 10⁻¹³	3.5196 × 10⁻⁷	1.5528 × 10⁻⁷	0.9967	2.1791 × 10⁻¹³	4.6681 × 10⁻⁷	2.3510 × 10⁻⁷	0.9962
5	3.7116	2.1288 × 10⁻¹³	4.6138 × 10⁻⁷	2.1905 × 10⁻⁷	0.9950	2.6294 × 10⁻¹³	5.1278 × 10⁻⁷	2.5875 × 10⁻⁷	0.9946
6	4.2737	7.2937 × 10⁻¹⁵	8.5403 × 10⁻⁸	4.4189 × 10⁻⁸	0.9998	3.8136 × 10⁻¹⁵	6.1755 × 10⁻⁸	3.9547 × 10⁻⁸	0.9999
7	4.8130	8.0846 × 10⁻¹⁵	8.9915 × 10⁻⁸	5.1029 × 10⁻⁸	0.9998	7.0667 × 10⁻¹⁵	8.4064 × 10⁻⁸	5.0239 × 10⁻⁸	0.9998
8	5.3189	4.3471 × 10⁻¹⁴	2.0850 × 10⁻⁷	1.1121 × 10⁻⁷	0.9991	3.4166 × 10⁻¹⁴	1.8484 × 10⁻⁷	9.5496 × 10⁻⁸	0.9991
9	5.8998	6.3681 × 10⁻¹⁵	7.9800 × 10⁻⁸	4.1977 × 10⁻⁸	0.9999	6.1065 × 10⁻¹⁵	7.8144 × 10⁻⁸	4.1437 × 10⁻⁸	0.9999
10	6.4539	2.9457 × 10⁻¹³	5.4274 × 10⁻⁷	2.6237 × 10⁻⁷	0.9934	2.8870 × 10⁻¹³	5.3731 × 10⁻⁷	2.6382 × 10⁻⁷	0.9933
11	7.0060	5.6392 × 10⁻¹¹	7.5095 × 10⁻⁶	3.4379 × 10⁻⁶	−0.2640	5.4435 × 10⁻¹¹	7.3780 × 10⁻⁶	3.2942 × 10⁻⁶	−0.2478
12	7.6982	3.2737 × 10⁻⁷	5.7217 × 10⁻⁴	2.6318 × 10⁻⁴	−7334.3460	3.1597 × 10⁻⁷	5.6211 × 10⁻⁴	2.5221 × 10⁻⁴	−7229.1041
13	8.3414	1.6842 × 10⁻¹¹	4.1039 × 10⁻⁶	1.8579 × 10⁻⁶	0.6067	1.9071 × 10⁻¹¹	4.3670 × 10⁻⁶	2.0322 × 10⁻⁶	0.6026
14	8.9202	5.3176 × 10⁻¹¹	7.2922 × 10⁻⁶	3.3594 × 10⁻⁶	−0.1307	4.2280 × 10⁻¹¹	6.5023 × 10⁻⁶	2.8812 × 10⁻⁶	−0.1087
15	9.3410	1.9545 × 10⁻¹¹	4.4210 × 10⁻⁶	2.0286 × 10⁻⁶	0.5593	1.9552 × 10⁻¹¹	4.4218 × 10⁻⁶	1.9978 × 10⁻⁶	0.5625
16	10.0022	8.8777 × 10⁻¹⁵	9.4222 × 10⁻⁸	5.5752 × 10⁻⁸	0.9998	9.5581 × 10⁻¹⁵	9.7766 × 10⁻⁸	5.6664 × 10⁻⁸	0.9998
17	10.5615	4.1529 × 10⁻¹¹	6.4443 × 10⁻⁶	2.9652 × 10⁻⁶	0.0773	3.9082 × 10⁻¹¹	6.2516 × 10⁻⁶	2.7742 × 10⁻⁶	0.0947
18	11.0355	1.2335 × 10⁻¹¹	3.5121 × 10⁻⁶	1.6065 × 10⁻⁶	0.7211	1.2656 × 10⁻¹¹	3.5575 × 10⁻⁶	1.6355 × 10⁻⁶	0.7197
19	11.5248	3.7456 × 10⁻¹⁴	1.9353 × 10⁻⁷	1.0576 × 10⁻⁷	0.9992	3.4843 × 10⁻¹⁴	1.8666 × 10⁻⁷	1.0325 × 10⁻⁷	0.9992
20	12.2448	5.8600 × 10⁻⁹	7.6551 × 10⁻⁵	3.4059 × 10⁻⁵	−136.5574	7.0564 × 10⁻⁹	8.4002 × 10⁻⁵	4.0930 × 10⁻⁵	−143.4051

Table A6. Performance metrics for ANFIS with Triangular Membership function models with different time d using cross-validation. Metrics Eval Time, MSE, RMSE, MAE, R² for both the training and test datasets. Best result is highlighted in bold.

Delay	Eval Time(s)	MSE_Train	RMSE_Train	MAE_Train	R²_Train	MSE_Test	RMSE_Test	MAE_Test	R²_Test
1	2.3089	5.2895 × 10⁻¹¹	7.2729 × 10⁻⁶	3.2705 × 10⁻⁶	−0.2535	6.2693 × 10⁻¹¹	7.9179 × 10⁻⁶	3.7083 × 10⁻⁶	−0.2810
2	3.2791	5.6158 × 10⁻¹¹	7.4939 × 10⁻⁶	3.4448 × 10⁻⁶	−0.2679	5.5241 × 10⁻¹¹	7.4324 × 10⁻⁶	3.3119 × 10⁻⁶	−0.2477
3	4.2068	5.4029 × 10⁻¹¹	7.3505 × 10⁻⁶	3.3286 × 10⁻⁶	−0.2580	5.9828 × 10⁻¹¹	7.7348 × 10⁻⁶	3.5508 × 10⁻⁶	−0.2670
4	5.1379	5.7863 × 10⁻¹¹	7.6068 × 10⁻⁶	3.4970 × 10⁻⁶	−0.2680	5.1050 × 10⁻¹¹	7.1449 × 10⁻⁶	3.1684 × 10⁻⁶	−0.2448
5	6.1085	5.4894 × 10⁻¹¹	7.4091 × 10⁻⁶	3.3714 × 10⁻⁶	−0.2611	5.8187 × 10⁻¹¹	7.6280 × 10⁻⁶	3.4739 × 10⁻⁶	−0.2617
6	7.1356	5.5803 × 10⁻¹¹	7.4701 × 10⁻⁶	3.3211 × 10⁻⁶	−0.2464	5.5382 × 10⁻¹¹	7.4419 × 10⁻⁶	3.5486 × 10⁻⁶	−0.2943
7	8.0671	5.5216 × 10⁻¹¹	7.4307 × 10⁻⁶	3.3730 × 10⁻⁶	−0.2595	5.6945 × 10⁻¹¹	7.5462 × 10⁻⁶	3.4391 × 10⁻⁶	−0.2622
8	8.9672	5.4187 × 10⁻¹¹	7.3612 × 10⁻⁶	3.2155 × 10⁻⁶	−0.2358	5.9538 × 10⁻¹¹	7.7161 × 10⁻⁶	3.8189 × 10⁻⁶	−0.3244
9	10.0539	5.4872 × 10⁻¹¹	7.4076 × 10⁻⁶	3.2907 × 10⁻⁶	−0.2459	5.8120 × 10⁻¹¹	7.6236 × 10⁻⁶	3.6540 × 10⁻⁶	−0.2983
10	10.8480	5.7613 × 10⁻¹¹	7.5904 × 10⁻⁶	3.4500 × 10⁻⁶	−0.2604	5.1908 × 10⁻¹¹	7.2047 × 10⁻⁶	3.2934 × 10⁻⁶	−0.2641
11	12.9791	5.3992 × 10⁻¹¹	7.3479 × 10⁻⁶	3.3587 × 10⁻⁶	−0.2641	6.0230 × 10⁻¹¹	7.7608 × 10⁻⁶	3.4848 × 10⁻⁶	−0.2525
12	13.7962	5.7085 × 10⁻¹¹	7.5555 × 10⁻⁶	3.4768 × 10⁻⁶	−0.2686	5.3181 × 10⁻¹¹	7.2926 × 10⁻⁶	3.2202 × 10⁻⁶	−0.2422
13	15.0747	5.7818 × 10⁻¹¹	7.6038 × 10⁻⁶	3.5455 × 10⁻⁶	−0.2778	5.1663 × 10⁻¹¹	7.1877 × 10⁻⁶	3.0714 × 10⁻⁶	−0.2234
14	15.6277	6.1075 × 10⁻¹¹	7.8150 × 10⁻⁶	3.6762 × 10⁻⁶	−0.2842	4.4209 × 10⁻¹¹	6.6490 × 10⁻⁶	2.7757 × 10⁻⁶	−0.2110
15	16.4818	5.8956 × 10⁻¹¹	7.6783 × 10⁻⁶	3.5931 × 10⁻⁶	−0.2804	4.9373 × 10⁻¹¹	7.0266 × 10⁻⁶	2.9826 × 10⁻⁶	−0.2198
16	16.9554	5.5234 × 10⁻¹¹	7.4319 × 10⁻⁶	3.3770 × 10⁻⁶	−0.2602	5.7816 × 10⁻¹¹	7.6037 × 10⁻⁶	3.4604 × 10⁻⁶	−0.2612
17	18.6361	5.5712 × 10⁻¹¹	7.4640 × 10⁻⁶	3.4169 × 10⁻⁶	−0.2651	5.6891 × 10⁻¹¹	7.5426 × 10⁻⁶	3.3786 × 10⁻⁶	−0.2510
18	18.9670	5.6679 × 10⁻¹¹	7.5285 × 10⁻⁶	3.4563 × 10⁻⁶	−0.2670	5.4818 × 10⁻¹¹	7.4040 × 10⁻⁶	3.2981 × 10⁻⁶	−0.2476
19	19.7371	5.4637 × 10⁻¹¹	7.3917 × 10⁻⁶	3.3915 × 10⁻⁶	−0.2667	5.9781 × 10⁻¹¹	7.7318 × 10⁻⁶	3.4613 × 10⁻⁶	−0.2506
20	20.7408	6.0140 × 10⁻¹¹	7.7550 × 10⁻⁶	3.5966 × 10⁻⁶	−0.2740	4.7127 × 10⁻¹¹	6.8649 × 10⁻⁶	2.9941 × 10⁻⁶	−0.2349

References

Agwu, O.E.; Alatefi, S.; Azim, R.A.; Alkouh, A. Applications of artificial intelligence algorithms in artificial lift systems: A critical review. Flow Meas. Instrum. 2024, 97, 102613. [Google Scholar] [CrossRef]
Lu, H.; Guo, L.; Azimi, M.; Huang, K. Oil and gas 4.0 era: A systematic review and outlook. Comput. Ind. 2019, 111, 68–90. [Google Scholar] [CrossRef]
Chu, Y.; Lindbråthen, A.; Lei, L.; He, X.; Hillestad, M. Mathematical modeling and process parametric study of CO₂ removal from natural gas by hollow fiber membranes. Chem. Eng. Res. Des. 2019, 148, 45–55. [Google Scholar] [CrossRef]
Gu, B. Mathematical modelling and simulation of CO₂ removal from natural gas using hollow fibre membrane modules. Korean Chem. Eng. Res. 2022, 60, 51–61. [Google Scholar] [CrossRef]
Ko, D. Development of a dynamic simulation model of a hollow fiber membrane module to sequester CO₂ from coalbed methane. J. Membr. Sci. 2018, 546, 258–269. [Google Scholar] [CrossRef]
Gbadamosi, A.O.; Junin, R.; Manan, M.A.; Agi, A.; Yusuff, A.S. An overview of chemical enhanced oil recovery: Recent advances and prospects. Int. Nano Lett. 2019, 9, 171–202. [Google Scholar] [CrossRef]
Baghban, A.; Azar, A.A. ANFIS modeling of CO₂ separation from natural gas using hollow fiber polymeric membrane. Energy Sources Part A Recovery Util. Environ. Eff. 2017, 40, 193–199. [Google Scholar] [CrossRef]
Rezakazemi, M.; Azarafza, A.; Dashti, A.; Shirazian, S. Development of hybrid models for prediction of gas permeation through FS/POSS/PDMS nanocomposite membranes. Int. J. Hydrogen Energy. 2018, 43, 17283–17294. [Google Scholar] [CrossRef]
Alibak, A.H.; Alizadeh, S.M.; Davodi Monjezi, S.; Alizadeh, A.; Alobaid, F.; Aghel, B. Developing a hybrid neuro-fuzzy method to predict carbon dioxide (CO₂) permeability in mixed matrix membranes containing SAPO-34 zeolite. Membranes 2022, 12, 1147. [Google Scholar] [CrossRef]
Abdollahi, S.A.; Ranjbar, S.F. Modeling the CO₂ separation capability of poly(4-methyl-1-pentane) membrane modified with different nanoparticles by artificial neural networks. Sci. Rep. 2023, 13, 8812. [Google Scholar] [CrossRef]
Kim, N.E.; Basak, J.K.; Kim, H.T. Application of hollow fiber membrane for the separation of carbon dioxide from atmospheric air and assessment of its distribution pattern in a greenhouse. Atmosphere 2023, 14, 299. [Google Scholar] [CrossRef]
Ames, A.; Flores-López, L.Z.; Rogel-Hernandez, E.; Castro, J.R.; Wakida, F.T.; Espinoza-Gomez, H. Electro-cross-flow ultrafiltration system for the rejection of nickel ions from aqueous solution, and Sugeno fuzzy model simulation. Chem. Eng. Commun. 2015, 202, 936–945. [Google Scholar] [CrossRef]
Fathi, S.; Rezaei, A.; Mohadesi, M.; Nazari, M. PSO-ANFIS and ANN modeling of propane/propylene separation using Cu-BTC adsorbent. J. Chem. Pet. Eng. 2019, 53, 191–201. [Google Scholar] [CrossRef]
Dashti, A.; Raji, M.; Azarafza, A.; Rezakazemi, M.; Shirazian, S. Computational simulation of CO₂ sorption in polymeric membranes using genetic programming. Arab. J. Sci. Eng. 2020, 45, 7655–7666. [Google Scholar] [CrossRef]
Babanezhad, M.; Nakhjiri, A.T.; Shirazian, S. Changes in the number of membership functions for predicting the gas volume fraction in two-phase flow using grid partition clustering of the ANFIS method. ACS Omega 2020, 5, 16284–16291. [Google Scholar] [CrossRef]
Pishnamazi, M.; Babanezhad, M.; Nakhjiri, A.T.; Rezakazemi, M.; Marjani, A.; Shirazian, S. ANFIS grid partition framework with difference between two sigmoidal membership functions structure for validation of nanofluid flow. Sci. Rep. 2020, 10, 15395. [Google Scholar] [CrossRef]
Sodeifian, G.; Niazi, Z. Prediction of CO₂ absorption by nanofluids using artificial neural network modeling. Int. Commun. Heat Mass Transf. 2021, 123, 105193. [Google Scholar] [CrossRef]
Aghilesh, K.; Mungray, A.; Agarwal, S.; Ali, J.; Garg, M.C. Performance optimisation of forward-osmosis membrane system using machine learning for the treatment of textile industry wastewater. J. Clean. Prod. 2021, 289, 125690. [Google Scholar] [CrossRef]
Ansarian, O.; Beiki, H. Nanofluids application to promote CO₂ absorption inside a bubble column: ANFIS and experimental study. Int. J. Environ. Sci. Technol. 2022, 19, 9979–9990. [Google Scholar] [CrossRef]
Banik, A.; Majumder, M.; Biswal, S.K.; Bandyopadhyay, T.K. Polynomial neural network-based group method of data handling algorithm coupled with modified particle swarm optimization to predict permeate flux (%) of rectangular sheet-shaped membrane. Chem. Pap. 2022, 76, 79–97. [Google Scholar] [CrossRef]
Nait Amar, M.; Djema, H.; Belhaouari, S.B.; Zeraibi, N. Toward robust models for predicting carbon dioxide absorption by nanofluids. Greenh. Gases Sci. Technol. 2022, 12, 537–551. [Google Scholar] [CrossRef]
Cooley, I.; Boobier, S.; Hirst, J.D.; Besley, E. Machine learning insights into predicting biogas separation in metal–organic frameworks. Commun. Chem. 2024, 7, 102. [Google Scholar] [CrossRef] [PubMed]
Amirkhani, F.; Dashti, A.; Abedsoltan, H.; Mohammadi, A.H.; Zhou, J.L.; Altaee, A. Modeling and estimation of CO₂ capture by porous liquids through machine learning. Sep. Purif. Technol. 2025, 359, 130445. [Google Scholar] [CrossRef]
Chen, B.; Dai, Y.; Ruan, X.; Xi, Y.; He, G. Integration of molecular dynamic simulation and free volume theory for modeling membrane VOC/gas separation. Front. Chem. Sci. Eng. 2018, 12, 296–305. [Google Scholar] [CrossRef]
Medi, B.; Vesali-Naseh, M.; Haddad-Hamedani, M. A generalized numerical framework for solving cocurrent and counter-current membrane models for gas separation. Heliyon 2022, 8, e09053. [Google Scholar] [CrossRef]
Khalilpour, R.; Abbas, A.; Lai, Z.; Pinnau, I. Analysis of hollow fibre membrane systems for multicomponent gas separation. Chem. Eng. Res. Des. 2013, 91, 332–347. [Google Scholar] [CrossRef]
Wang, Z.; Han, Y.; Cao, W. Simplified mixing rules for calculating transport coefficients of high-temperature air. Int. J. Aerosp. Eng. 2023, 2023, 7644738. [Google Scholar] [CrossRef]
Yerumbu, N.; Sahoo, R.K.; Sivalingam, M. Multiobjective optimization of membrane in hybrid cryogenic CO₂ separation process for coal-fired power plants. Environ. Sci. Pollut. Res. 2023, 30, 108783–108801. [Google Scholar] [CrossRef]
Khaghani, A.; Moghaddam, M.S.; Kheshti, M.F. Investigation of the liquid-vapor equilibrium in the mixture of carbon dioxide and normal alkanes in binary systems using the Peng–Robinson equation of state modified with mixing rules. Chem. Thermodyn. Therm. Anal. 2025, 17, 100161. [Google Scholar] [CrossRef]
Araújo, O.F.; de Medeiros, J.L.; Alves, R.M.B. CO₂ Utilization: A Process Systems Engineering Vision. In CO₂ Sequestration and Valorization; Morgado, C.R.V., Esteves, V.P.P., Eds.; IntechOpen: London, UK, 2014. [Google Scholar] [CrossRef]
Dasgupta, S.; Rajasekaran, M.; Roy, P.K.; Thakkar, F.M.; Pathak, A.D.; Ayappa, K.G.; Maiti, P.K. Influence of chain length on structural properties of carbon molecular sieving membranes and their effects on CO₂, CH₄ and N₂ adsorption: A molecular simulation study. J. Membr. Sci. 2022, 664, 121044. [Google Scholar] [CrossRef]
Castellani, B.; Gambelli, A.M.; Nicolini, A.; Rossi, F. Energy and Environmental Analysis of Membrane-Based CH₄-CO₂ Replacement Processes in Natural Gas Hydrates. Energies 2019, 12, 850. [Google Scholar] [CrossRef]
Da Conceicao, M.; Nemetz, L.; Rivero, J.; Hornbostel, K.; Lipscomb, G. Gas separation membrane module modeling: A comprehensive review. Membranes 2023, 13, 639. [Google Scholar] [CrossRef] [PubMed]
Hillman, F.; Wang, K.; Liang, C.Z.; Seng, D.H.L.; Zhang, S. Breaking the permeance–selectivity tradeoff for post-combustion carbon capture: A bio-inspired strategy to form ultrathin hollow fiber membranes. Adv. Mater. 2023, 35, 2305463. [Google Scholar] [CrossRef] [PubMed]
Chen, B.; Yang, T.; Xiao, W.; Nizamani, A.K. Conceptual design of pyrolytic oil upgrading process enhanced by membrane-integrated hydrogen production system. Processes 2019, 7, 284. [Google Scholar] [CrossRef]
Ruz-Hernandez, J.A.; Matsumoto, Y.; Arellano-Valmaña, F.; Pitalúa-Díaz, N.; Cabanillas-López, R.E.; Abril-García, J.H.; Herrera-López, E.J.; Velázquez-Contreras, E.F. Meteorological Variables’ Influence on Electric Power Generation for Photovoltaic Systems Located at Different Geographical Zones in Mexico. Appl. Sci. 2019, 9, 1649. [Google Scholar] [CrossRef]
Šverko, Z.; Vrankić, M.; Vlahinić, S.; Rogelj, P. Complex Pearson correlation coefficient for EEG connectivity analysis. Sensors 2022, 22, 1477. [Google Scholar] [CrossRef]
Salmerón Gómez, R.; García Pérez, J.; López Martín, M.D.M.; García, C.G. Collinearity diagnostic applied in ridge estimation through the variance inflation factor. J. Appl. Stat. 2016, 43, 1831–1849. [Google Scholar] [CrossRef]
Pitalúa-Díaz, N.; Arellano-Valmaña, F.; Ruz-Hernandez, J.A.; Matsumoto, Y.; Alazki, H.; Herrera-López, E.J.; Hinojosa-Palafox, J.F.; García-Juárez, A.; Pérez-Enciso, R.A.; Velázquez-Contreras, E.F. An ANFIS-Based Modeling Comparison Study for Photovoltaic Power at Different Geographical Places in Mexico. Energies 2019, 12, 2662. [Google Scholar] [CrossRef]
Eliwa, E.H.I.; El Koshiry, A.M.; Abd El-Hafeez, T.; Omar, A. Optimal gasoline price predictions: Leveraging the ANFIS regression model. Int. J. Intell. Syst. 2024, 2024, 8462056. [Google Scholar] [CrossRef]
Jang, J.S.R.; Sun, C.T.; Mizutani, E. Neuro-fuzzy and soft computing-a computational approach to learning and machine intelligence [Book Review]. IEEE Trans. Autom. Control. 1997, 42, 1482–1484. [Google Scholar] [CrossRef]
Fetimi, A.; Merouani, S.; Khan, M.S.; Asghar, M.N.; Yadav, K.K.; Jeon, B.-H.; Hamachi, M.; Kebiche-Senhaji, O.; Benguerba, Y. Modeling of textile dye removal from wastewater using innovative oxidation technologies (Fe(II)/chlorine and H₂O₂/periodate processes): Artificial neural network–particle swarm optimization hybrid model. ACS Omega 2022, 7, 13818–13825. [Google Scholar] [CrossRef]
Bird, R.B.; Stewart, W.E.; Lightfoot, E.N. Transport Phenomena, 2nd ed.; John Wiley: Hoboken, NJ, USA, 2002. [Google Scholar]

Figure 1. Basic structure of model-based optimization.

Figure 2. Countercurrent flow dynamics in an HFMM scheme.

Figure 3. Statistical analysis framework for predictor dimensionality reduction.

Figure 4. Architecture scheme of a two-input ANFIS network.

Figure 5. Validation of the proposed membrane model using experimental results and the Ko model. (a) Methane purity as a function of feed flow rate. (b) Methane recovery as a function of feed flow rate.

Figure 6. Relative error of the Ko model and the proposed model for the prediction of (a) methane purity and (b) methane recovery as a function of feed flow rate. Results correspond to the same experimental conditions as in Figure 5.

Figure 7. Mean absolute error (MAE) of the Ko model and the proposed model for the prediction of (a) methane purity and (b) methane recovery as a function of feed flow rate. Results correspond to the same experimental conditions as in Figure 5.

Figure 8. Pearson correlation matrix between all candidate retentate and permeate variables included in the variable selection procedure. The heatmap indicates the degree of linear relationship between each pair of variables.

Figure 9. (a) Absolute error for each sample in the validation dataset for the multiple linear regression model. (b) Histogram of residuals showing the distribution and magnitude of errors across all samples. These visualizations provide a direct assessment of the model’s prediction accuracy and the normality of residuals, which are essential for evaluating regression model assumptions.

Figure 10. (a) Residuals as a function of predicted values for a multiple linear regression model, highlighting any trends or heteroscedasticity in model errors. (b) Q-Q plot of residuals, used to assess the normality of errors and the adequacy of linear model assumptions.

Figure 11. Comparison of actual vs. predicted values for the multiple linear regression models. (a) Performance of the model with the initial set of 18 variables. (b) Performance of the reduced model with the 4 selected variables. The dashed line indicates the ideal 1:1 relationship.

Figure 12. Experimental values of retentate

{\hat{v}}_{{C H}_{4}}^{R e}

(

\hat{y}

) for all samples in the dataset. This variable is used as the output (target) in the ANFIS modeling.

Figure 13. Input variables used for the ANFIS model: (a)

{P m}_{{C H}_{4}}

(

x_{1}

); (b)

V^{R e}

(

x_{2}

); (c)

P^{R e}

(

x_{3}

); (d)

μ_{m i x}^{R e}

(

x_{4}

). Each plot shows the respective variable as a function of sample number for the entire dataset.

Figure 14. (a) Comparison between experimental (Real Train: solid blue line; Real Test: solid orange line) and predicted values (Predicted Train: dashed blue line; Predicted Test: dashed orange line) of

{\hat{v}}_{{C H}_{4}}^{R e}

obtained by the best ANFIS model (

d

= 4). The model is trained using 70% of the data and tested on the remaining 30%. Vertical dashed line indicates train/test split. (b) Residuals (prediction errors) for test set using best ANFIS model (

d

= 4).

Figure 15. Architecture of the single-rule ANFIS model for the optimal time lag (d = 3). This title specifies that the diagram represents the structure of the best-performing model identified through cross-validation, which uses a single fuzzy rule and a time delay of three samples.

Figure 16. (a) Evolution of the adaptive step size (k) and learning rate (η) across 50 training epochs. (b) Squared error per epoch, illustrating the model’s convergence behavior.

Figure 17. Validation of the Optimal ANFIS Model’s Predictive Performance. The plot shows the superposition of the actual (solid blue) and model-predicted (dashed yellow) values for the methane volume

{\hat{v}}_{{C H}_{4}}^{R e}

in the retentate across the entire dataset, demonstrating a near-perfect fit.

Table 1. Comparable studies.

Ref	Year	Domain	Model	Application	Gases	Key Metrics
[7]	2017	Polymeric hollow fiber (HF)	ANFIS (GA; ≈7 gaussmf MFs)	Predict permeate-side CO₂ for CH₄/CO₂ HF separation	CH₄/CO₂	R² ≈ 0.9993 RMSE ≈ 0.0064 AARD ≈ 1.25%
[8]	2018	MMM (FS/POSS/PDMS)	DE-ANFIS; CSA-LSSVM	Gas permeation (single-gas)	H₂/CH₄/CO₂/C₃H₈	DE-ANFIS: R² = 0.9981 (total); CSA-LSSVM (test): R² = 0.9689; CSA-LSSVM (train/test): R² = 0.9946/0.9689 MSE = 0.0003/0.0011, MAE = 0.0114/0.0257
[9]	2022	MMM (SAPO-34)	ANFIS (SC/FCM/GP)	Predict CO₂ permeability	CO₂	R² > 0.995 AARD < 3%
[10]	2023	MMM (PMP + nanoparticles)	MLP-ANN (3–8–1) MLP (3–8–1; BR)	CO₂ separation capability (CO₂ permeability)	CO₂	R = 0.99477 MAE = 6.87 AARD = 5.46% MSE = 152.75;
[11]	2023	HFMM	Empirical + regression	CO₂ enrichment/distribution	CO₂	R² ≈ 0.628

Table 2. Adjacent studies.

Ref	Year	Domain	Model	Application	Gases	Key Metrics
[12]	2015	Electroultrafiltration (EUF/HF (water)	TSK	Ni²⁺ removal under applied voltage	—	R² ≥ 0.98 (model fit) max rejection ≈ 60% (PBCC5 @ 3.5–4.5 V) 45% (PLCC5 @ 4 V)
[13]	2019	Adsorption (Cu-BTC)	PSO-ANFIS vs. ANN	C₃H₆/C₃H₈ adsorption	—	ANN: MAE = 0.111 PSO-ANFIS: MAE = 0.421
[14]	2020	Polymeric membranes	GP	Predict CO² solubility in PS, PVAc, PBS, PBSA	CO₂	R² > 0.98 ARD (per polymer) = 0.095%, 0.0503%, 0.0312%, 0.039% Split (train/test) = 70/30
[15]	2020	Bubble column (hydrodynamics)	ANFIS (grid, dsigmf)	Void fraction (CFD→ANFIS)	—	R = 0.9999 R_test = 0.64
[16]	2020	Nanofluids (CFD validation)	ANFIS (grid; 4 MFs/input → 64 rules; dsigmf)	Predict flow field from CFD → ANFIS (lid-driven square cavity, Cu/H₂O)	—	ANFIS R = 0.999 Train fraction ≈ 65% Max iters ≈ 800 ACO surrogate: R = 0.92
[17]	2021	Nanofluids	ANN (MLP, 6–9–1; LM)	CO₂ absorption in nanofluids (closed-vessel absorber)	CO₂	R = 0.9996 MSE = 2.36 × 10−⁵ MAE% = 0.326 N = 165
[18]	2021	Forward osmosis (textile wastewater)	RSM; ANN (FFBP-LM, 5–10–2); ANFIS (Sugeno; 5 inputs)	Predict/optimize water flux (Jw) and reverse salt flux (Js) with fertilizer draw solution	—	ANN (Jw): R² = 0.798 R = 0.8933 MAD = 0.8120 MSE = 1.4800 RMSE = 1.2170 ANN (Js): R² = 0.7807 R = 0.8836 MAD = 0.7940 MSE = 1.5220 RMSE = 1.2340
[19]	2022	Nanofluids (bubble column)	ANFIS-FCM (3 rules; hybrid)	CO₂ absorption in SiO₂/H₂O and Fe₃O₄/H₂O (output: St/kₗa)	CO²	ANFIS (70/15/15): R²_train = 0.998 R²_test = 0.997; RMSE_train = 0.0136 RMSE_test = 0.0194
[20]	2022	Sheet (water)	PNN-GMDH + PSO	Permeate-flux optimization	—	R² = 0.983 PI = 0.723 NSE = 0.984
[21]	2022	Nanofluids	GP; GMDH	CO₂ absorption correlations	CO₂	GP (all): R² = 0.9914 AARD = 3.732% RMSE = 0.0141 n = 230 Split = 80/20 GMDH (all): R² = 0.9726 AARD = 8.1134% RMSE = 0.0231
[22]	2024	Metal–Organic Frameworks (MOFs)	RF	Predicting biogas separation in MOFs	CH₄/CO₂	Metrics not reported in the main text
[23]	2025	Porous liquids	CSA-LSSVM; MLP; PSO-ANFIS; ANFIS	CO₂ solubility	CO₂	CSA-LSSVM: AARD = 3.17% MLP: AARD = 6.64% PSO-ANFIS: AARD = 8.67% ANFIS: AARD = 12.98%

Table 3. Initial conditions of the Process.

Parameter	Value	Unit
$P$	8	atm
$p$	1	atm
$T$	298.15	K
$v$	0.00135	mol/s
$D_{i}$	0.0002	m
$D_{o}$	0.0004	m
$N$	3800	-
$D$	0.0381	m
$L$	0.27	m

Values of the conditions process were obtained from [5].

Table 4. Initial conditions of chemical species.

Parameter	Mole Fraction	Permeance ( $P_{m i}$ ) $[m o l / (m^{2} \cdot a t m \cdot s)]$
CO₂	0.10	1.51 × 10⁻³
CH₄	0.90	5.17 × 10⁻⁵

Values of permeance were obtained from [5].

Table 5. Summary of ANFIS Model Configuration and Training Parameters.

Parameter	Description
x_m	Permeance Pm_CH₄, Retentate Volume V^Re, Retentate Pressure P^Re, Retentate Mixture Viscosity $μ_{m i x}^{R e}$
$\hat{y}$	Methane volume in retentate ${\hat{v}}_{{C H}_{4}}^{R}$
$d$	1 to 20 time lags to incorporate temporal dependencies in the inputs
Membership function type	Sigmoidal type
Number of Membership Functions	1 (sigmoidal)
Number or Fuzzy rules	1
Data Split	70% training, 30% testing, with no overlap between sets and cross valida-tion with 5 folds.
Training Method	Hybrid approach: linear parameter estimation via pseudoinverse combined with nonlinear optimization of parameters $a_{M F}$ $and c_{M F}$ by gradient descent using analytical derivatives
Parameter Initialization	Random initialization for parameters $a_{M F}$ $and c_{M F}$ within defined ranges.
Stopping Criteria	Maximum of 50 iterations.
Performance Metrics	Eval Time, MSE, RMSE, MAE, and coefficient of determination ( $R^{2}$ ) calculated for training and testing datasets
Best Model Selection	Based on lowest test set MSE among all evaluated $d$
Validation	Graphical comparison of predicted vs. actual values in training and testing; residual error analysis

Table 6. Initial individual variable regression metrics for model reduction.

Variable	$R^{2}$	MAE	${V I F}_{q, r i d g e}$
Variable permeance	0.4883	3.93 × 10⁻⁶	3.70
Retentate volume	0.9998	5.47 × 10⁻⁸	0.52
Permeate volume	0.9998	5.47 × 10⁻⁸	0.52
Retentate fraction CO₂	2.85 × 10⁻⁴	5.06 × 10⁻⁶	0.25
Retentate fraction CH₄	2.85 × 10⁻⁴	5.06 × 10⁻⁶	0.25
Permeate fraction CO₂	0.8520	1.76 × 10⁻⁶	0.24
Permeate fraction CH₄	0.8520	1.76 × 10⁻⁶	0.24
Retentate pressure	0.8356	2.26 × 10⁻⁶	3.32
Permeate pressure	0.8039	2.36 × 10⁻⁶	5.62
Retentate flow CO₂	0.8591	1.71 × 10⁻⁶	0.18
Retentate flow CH₄	1.0000	2.21 × 10⁻¹⁰	0.58
Permeate flow CO₂	0.8591	1.71 × 10⁻⁶	0.18
Permeate flow CH₄	1.0000	2.21 × 10⁻¹⁰	0.58
Retentate volume CO₂	0.8591	1.71 × 10⁻⁶	0.18
Permeate volume CO₂	0.8591	1.71 × 10⁻⁶	0.18
Permeate volume CH₄	1.0000	2.21 × 10⁻¹⁰	0.58
Retentate mix viscosity	0.4305	3.02 × 10⁻⁶	2.18
Permeate mix viscosity	0.8992	1.43 × 10⁻⁶	0.37

Table 7. Comparison of regression model performance metrics.

Metric	Value (18 Variables)	Value (4 Variables)
$R^{2}$	0.999	0.998
$M A E$	1.159 × 10⁻⁷	2.279 × 10⁻⁷

Table 8. The values of

β_{q}

.

Table 8. The values of

β_{q}

.

Parameter	MCS
$β_{0}$	−0.035
$β_{1}$	231
$β_{2}$	0.162
$β_{3}$	−0.162
$β_{4}$	6.26 × 10⁻¹²
$β_{5}$	−6.26 × 10⁻¹²
$β_{6}$	5.46 × 10⁻⁸
$β_{7}$	−5.46 × 10⁻⁸
$β_{8}$	0.00437
$β_{9}$	2.44 × 10⁻⁵
$β_{10}$	0.00329
$β_{11}$	0.00392
$β_{12}$	−0.00329
$β_{13}$	−0.00392
$β_{14}$	0.147
$β_{15}$	−0.147
$β_{16}$	−0.175
$β_{17}$	−0.0723
$β_{18}$	0.0734

Table 9. Criteria and Rationale for Selecting Input Variables for ANFIS Modeling.

Variable	Statistical Justification	Physical Justification	Multicollinearity	Reason for Selection
${P m}_{{C H}_{4}}$	$Moderate R^{2}$ $(0.49), low {V I F}_{q, r i d g e}$ (3.70), strong negative ρ with pressure and volume	Captures intrinsic membrane transport property	Low to moderate	Orthogonal information; essential transport parameter
$V^{R e}$	$Very high R^{2}$ $(0.9998), low MAE, low {V I F}_{q, r i d g e}$ (0.52)	Reflects process state, influences driving force	High with some variables	High predictive power; represents feed conditions
$P^{R e}$	$High R^{2}$ $(0.84), moderate {V I F}_{q, r i d g e}$ (3.32)	Key driver for mass transfer, operationally controlled	Moderate	Adds unique, mechanistically important information
$μ_{m i x}^{R e}$	Moderate $R^{2}$ $(0.43), acceptable {V I F}_{q, r i d g e}$ (2.18), low ρ with other selected variables	Influences mass transfer and flow resistance	Low	Mechanistic relevance; low redundancy

Table 11. Performance metrics for ANFIS models at different times,

d

. Metrics Eval Time, MSE, RMSE, MAE, R2 for both the training and test datasets. Best result is highlighted in bold.

Table 11. Performance metrics for ANFIS models at different times,

d

. Metrics Eval Time, MSE, RMSE, MAE, R2 for both the training and test datasets. Best result is highlighted in bold.

Delay	Eval Time (s)	MSE_Train	RMSE_Train	MAE_Train	R²_Train	MSE_Test	RMSE_Test	MAE_Test	R²_Test
1	0.0077	6.5682 × 10⁻¹⁵	8.1045 × 10⁻⁸	4.0207 × 10⁻⁸	0.9998	5.8827 × 10⁻¹⁵	7.6699 × 10⁻⁸	4.3855 × 10⁻⁸	0.9999
2	0.0078	6.5491 × 10⁻¹⁵	8.0926 × 10⁻⁸	4.0288 × 10⁻⁸	0.9998	5.8646 × 10⁻¹⁵	7.6581 × 10⁻⁸	4.3781 × 10⁻⁸	0.9999
3	0.0089	6.5080 × 10⁻¹⁵	8.0672 × 10⁻⁸	4.0305 × 10⁻⁸	0.9998	5.8353 × 10⁻¹⁵	7.6389 × 10⁻⁸	4.3838 × 10⁻⁸	0.9999
4	0.0111	6.4958 × 10⁻¹⁵	8.0597 × 10⁻⁸	4.0715 × 10⁻⁸	0.9998	5.8345 × 10⁻¹⁵	7.6384 × 10⁻⁸	4.3929 × 10⁻⁸	0.9999
5	0.0143	6.4895 × 10⁻¹⁵	8.0557 × 10⁻⁸	4.0732 × 10⁻⁸	0.9998	5.8459 × 10⁻¹⁵	7.6459 × 10⁻⁸	4.4128 × 10⁻⁸	0.9999
6	0.0142	6.4649 × 10⁻¹⁵	8.0405 × 10⁻⁸	4.0680 × 10⁻⁸	0.9998	5.8456 × 10⁻¹⁵	7.6457 × 10⁻⁸	4.4327 × 10⁻⁸	0.9999
7	0.0158	6.4602 × 10⁻¹⁵	8.0375 × 10⁻⁸	4.0815 × 10⁻⁸	0.9998	5.8696 × 10⁻¹⁵	7.6613 × 10⁻⁸	4.4559 × 10⁻⁸	0.9999
8	0.0188	6.4516 × 10⁻¹⁵	8.0322 × 10⁻⁸	4.1009 × 10⁻⁸	0.9998	5.9111 × 10⁻¹⁵	7.6884 × 10⁻⁸	4.4614 × 10⁻⁸	0.9999
9	0.0192	6.4121 × 10⁻¹⁵	8.0076 × 10⁻⁸	4.1117 × 10⁻⁸	0.9998	5.8620 × 10⁻¹⁵	7.6564 × 10⁻⁸	4.4542 × 10⁻⁸	0.9999
10	0.0215	1.7664 × 10⁻¹³	4.2029 × 10⁻⁷	1.9986 × 10⁻⁷	0.9958	2.3289 × 10⁻¹³	4.8259 × 10⁻⁷	2.5229 × 10⁻⁷	0.9952
11	0.0234	1.9624 × 10⁻¹²	1.4008 × 10⁻⁶	6.2475 × 10⁻⁷	0.9536	2.3208 × 10⁻¹²	1.5234 × 10⁻⁶	7.6129 × 10⁻⁷	0.9522
12	0.0260	3.4528 × 10⁻¹¹	5.8760 × 10⁻⁶	2.5711 × 10⁻⁶	0.1844	4.2499 × 10⁻¹¹	6.5192 × 10⁻⁶	3.2217 × 10⁻⁶	0.1263
13	0.0274	5.0096 × 10⁻¹¹	7.0779 × 10⁻⁶	3.0953 × 10⁻⁶	−0.1821	6.1502 × 10⁻¹¹	7.8423 × 10⁻⁶	3.8721 × 10⁻⁶	−0.2644
14	0.0289	4.8944 × 10⁻¹¹	6.9960 × 10⁻⁶	3.0619 × 10⁻⁶	−0.1536	6.0004 × 10⁻¹¹	7.7462 × 10⁻⁶	3.8249 × 10⁻⁶	−0.2336
15	0.0408	1.7002 × 10⁻¹²	1.3039 × 10⁻⁶	5.8830 × 10⁻⁷	0.9603	2.0688 × 10⁻¹²	1.4383 × 10⁻⁶	7.2201 × 10⁻⁷	0.9568
16	0.0436	5.1784 × 10⁻¹¹	7.1961 × 10⁻⁶	3.1518 × 10⁻⁶	−0.2097	6.1707 × 10⁻¹¹	7.8554 × 10⁻⁶	3.8599 × 10⁻⁶	−0.2889
17	0.0346	9.3228 × 10⁻¹²	3.0533 × 10⁻⁶	1.3496 × 10⁻⁶	0.7825	1.1175 × 10⁻¹¹	3.3429 × 10⁻⁶	1.6535 × 10⁻⁶	0.7666
18	0.0360	8.3914 × 10⁻¹²	2.8968 × 10⁻⁶	1.2885 × 10⁻⁶	0.8044	9.7818 × 10⁻¹²	3.1276 × 10⁻⁶	1.5537 × 10⁻⁶	0.7957
19	0.0381	4.9418 × 10⁻⁸	2.2230 × 10⁻⁴	9.8204 × 10⁻⁵	−1150.4074	5.8296 × 10⁻⁸	2.4145 × 10⁻⁴	1.1864 × 10⁻⁴	−1213.9289
20	0.0397	3.9225 × 10⁻¹¹	6.2629 × 10⁻⁶	2.7594 × 10⁻⁶	0.0871	4.6493 × 10⁻¹¹	6.8186 × 10⁻⁶	3.3415 × 10⁻⁶	0.0311

Table 12. Performance metrics for ANFIS models with different time

d

using cross-validation. Metrics Eval Time, MSE, RMSE, MAE, R² for both the training and test datasets. The best result is highlighted in bold.

Table 12. Performance metrics for ANFIS models with different time

d

using cross-validation. Metrics Eval Time, MSE, RMSE, MAE, R² for both the training and test datasets. The best result is highlighted in bold.

Delay	Eval Time (s)	MSE Test (Avg ± Std)	RMSE Test (Avg ± Std)	MAE Test (Avg ± Std)	R² Test (Avg ± Std)
1	0.0021	7.735 × 10⁻¹⁵ ± 3.356 × 10⁻¹⁵	8.642 × 10⁻⁸ ± 1.824 × 10⁻⁸	3.890 × 10⁻⁸ ± 4.691 × 10⁻⁹	0.9998 ± 0.0001
2	0.0014	8.301 × 10⁻¹⁵ ± 3.746 × 10⁻¹⁵	8.957 × 10⁻⁸ ± 1.865 × 10⁻⁸	4.036 × 10⁻⁸ ± 2.922 × 10⁻⁹	0.9998 ± 0.0001
3	0.0018	7.256 × 10⁻¹⁵ ± 4.286 × 10⁻¹⁵	8.276 × 10⁻⁸ ± 2.255 × 10⁻⁸	4.006 × 10⁻⁸ ± 2.732 × 10⁻⁹	0.9998 ± 0.0001
4	0.0027	8.387 × 10⁻¹⁵ ± 3.983 × 10⁻¹⁵	8.945 × 10⁻⁸ ± 2.193 × 10⁻⁸	4.145 × 10⁻⁸ ± 3.036 × 10⁻⁹	0.9998 ± 0.0001
5	0.0026	7.565 × 10⁻¹⁵ ± 3.810 × 10⁻¹⁵	8.472 × 10⁻⁸ ± 2.202 × 10⁻⁸	4.098 × 10⁻⁸ ± 3.133 × 10⁻⁹	0.9998 ± 0.0001
6	0.0030	7.581 × 10⁻¹⁵ ± 4.334 × 10⁻¹⁵	8.470 × 10⁻⁸ ± 2.257 × 10⁻⁸	4.087 × 10⁻⁸ ± 4.583 × 10⁻⁹	0.9998 ± 0.0001
7	0.0032	7.568 × 10⁻¹⁵ ± 3.840 × 10⁻¹⁵	8.508 × 10⁻⁸ ± 2.030 × 10⁻⁸	4.187 × 10⁻⁸ ± 2.673 × 10⁻⁹	0.9998 ± 0.0001
8	0.0034	7.524 × 10⁻¹⁵ ± 3.731 × 10⁻¹⁵	8.462 × 10⁻⁸ ± 2.132 × 10⁻⁸	4.169 × 10⁻⁸ ± 3.158 × 10⁻⁹	0.9998 ± 0.0001
9	0.0037	8.011 × 10⁻¹⁵ ± 3.050 × 10⁻¹⁵	8.812 × 10⁻⁸ ± 1.754 × 10⁻⁸	4.281 × 10⁻⁸ ± 3.534 × 10⁻⁹	0.9998 ± 0.0001
10	0.0040	7.718 × 10⁻¹⁵ ± 3.231 × 10⁻¹⁵	8.649 × 10⁻⁸ ± 1.720 × 10⁻⁸	4.302 × 10⁻⁸ ± 2.830 × 10⁻⁹	0.9998 ± 0.0001
11	0.0044	6.149 × 10⁻¹² ± 9.529 × 10⁻¹²	1.681 × 10⁻⁶ ± 2.039 × 10⁻⁶	8.087 × 10⁻⁷ ± 1.019 × 10⁻⁶	0.8764 ± 0.1741
12	0.0049	1.210 × 10⁻¹¹ ± 1.793 × 10⁻¹¹	2.259 × 10⁻⁶ ± 2.957 × 10⁻⁶	1.056 × 10⁻⁶ ± 1.399 × 10⁻⁶	0.7511 ± 0.3601
13	0.0054	1.184 × 10⁻¹¹ ± 1.856 × 10⁻¹¹	2.788 × 10⁻⁶ ± 2.255 × 10⁻⁶	1.253 × 10⁻⁶ ± 9.445 × 10⁻⁷	0.7058 ± 0.4847
14	0.0056	1.527 × 10⁻¹¹ ± 1.386 × 10⁻¹¹	3.359 × 10⁻⁶ ± 2.234 × 10⁻⁶	1.573 × 10⁻⁶ ± 1.078 × 10⁻⁶	0.6688 ± 0.2729
15	0.0060	1.627 × 10⁻¹⁰ ± 3.052 × 10⁻¹⁰	9.206 × 10⁻⁶ ± 9.873 × 10⁻⁶	4.044 × 10⁻⁶ ± 4.061 × 10⁻⁶	−3.3766 ± 8.4932
16	0.0066	3.130 × 10⁻¹¹ ± 2.162 × 10⁻¹¹	5.273 × 10⁻⁶ ± 2.088 × 10⁻⁶	2.396 × 10⁻⁶ ± 9.865 × 10⁻⁷	0.2707 ± 0.4788
17	0.0067	3.619 × 10⁻¹¹ ± 1.319 × 10⁻¹¹	5.938 × 10⁻⁶ ± 1.077 × 10⁻⁶	2.706 × 10⁻⁶ ± 4.863 × 10⁻⁷	0.1866 ± 0.2695
18	0.0070	2.568 × 10⁻¹¹ ± 2.965 × 10⁻¹¹	4.008 × 10⁻⁶ ± 3.468 × 10⁻⁶	1.840 × 10⁻⁶ ± 1.585 × 10⁻⁶	0.4516 ± 0.6191
19	0.0074	7.472 × 10⁻¹⁰ ± 1.654 × 10⁻⁹	1.424 × 10⁻⁵ ± 2.608 × 10⁻⁵	6.721 × 10⁻⁶ ± 1.244 × 10⁻⁵	−14.9436 ± 35.2595
20	0.0082	9.743 × 10⁻¹² ± 1.215 × 10⁻¹¹	2.649 × 10⁻⁶ ± 1.846 × 10⁻⁶	1.213 × 10⁻⁶ ± 7.754 × 10⁻⁷	0.7801 ± 0.2853

Table 13. Cross-validation performance metrics by fold for the optimal ANFIS model (

d

= 3).

Table 13. Cross-validation performance metrics by fold for the optimal ANFIS model (

d

= 3).

Fold	MSE Test	RMSE Test	MAE Test	R² Test
1	6.781 × 10⁻¹⁵	8.235 × 10⁻⁸	4.209 × 10⁻⁸	0.9999
2	3.406 × 10⁻¹⁵	5.836 × 10⁻⁸	3.532 × 10⁻⁸	0.9999
3	6.108 × 10⁻¹⁵	7.815 × 10⁻⁸	3.996 × 10⁻⁸	0.9999
4	1.430 × 10⁻¹⁴	1.196 × 10⁻⁷	4.307 × 10⁻⁸	0.9996
5	7.742 × 10⁻¹⁵	8.799 × 10⁻⁸	4.179 × 10⁻⁸	0.9998

Table 14. Inference Time Analysis for the Optimal ANFIS Model.

# Samples	Average Time	Standard Deviation	Minimum Time	Maximum Latency
1	0.3930 ms	0.0000 ms	0.3930 ms	0.3930 ms
10	0.1252 ms	0.2486 ms	0.0080 ms	0.8190 ms
100	0.0085 ms	0.0025 ms	0.0080 ms	0.0250 ms
500	0.0086 ms	0.0023 ms	0.0070 ms	0.0370 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Mathematical and Neuro-Fuzzy Modeling of a Hollow Fiber Membrane System for a Petrochemical Process

Abstract

1. Introduction

2. Materials and Methods

2.1. Mathematical Modeling of HFMM

2.1.1. Dynamic Determination of Viscosity

2.1.2. Peng–Robinson Equation of State for Non-Ideal Gas Behavior

2.1.3. Permeance: Baseline Assumption and Post-Validation Exploration

2.2. Statistical Method

2.2.1. Data Normalization

2.2.2. Normalized Multiple Linear Regression

2.2.3. Parameter Estimation via Gradient Descent

2.2.4. Multicollinearity Analysis

2.2.5. Validation of Model Assumptions via Diagnostic Plots

2.3. Intelligent Modeling with ANFIS

3. Results

3.1. Mathematical Modeling of HFMM Results and Validation

3.2. Statistical Method Results

3.3. ANFIS Simulation Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Nomenclature

Appendix A

References

Article Metrics

Citations

Article Access Statistics