Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study

Zhu, Linling; Zhu, Ruhua; Zhou, Jun; Luo, Huiqing; Li, Xiaochuan; Wei, Tao

doi:10.3390/math14071142

Open AccessArticle

Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study

by

Linling Zhu

¹,

Ruhua Zhu

^1,2,

Jun Zhou

^1,3,

Huiqing Luo

^1,4,

Xiaochuan Li

¹ and

Tao Wei

^1,*

¹

Occupational Health Institute, China Academy of Safety Science and Technology, Beijing 100012, China

²

School of Emergency Management and Safety Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China

³

School of Chemical Engineering and Technology, China University of Mining and Technology, Xuzhou 221116, China

⁴

School of Mechanical Engineering, Sichuan University of Science and Engineering, Zigong 643000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(7), 1142; https://doi.org/10.3390/math14071142

Submission received: 23 February 2026 / Revised: 22 March 2026 / Accepted: 24 March 2026 / Published: 29 March 2026

(This article belongs to the Special Issue Modeling, Simulation, Control and Optimization in Engineering with Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The removal efficiency of wet scrubbers is governed by complex nonlinear interactions among operating parameters such as liquid level, airflow velocity, and dust concentration, making accurate real-time prediction challenging, which in turn leads to operational instability, increased energy consumption, and excessive emissions. To address this bottleneck, we first introduce multi-gene genetic programming (MGGP) to develop interpretable models quantifying multi-parameter coupling and predicting removal efficiency for PM₁, PM_2.5, PM₁₀, and TSP. Key input variables, including liquid level height, inlet airflow velocity, system pressure, and inlet dust concentration, were identified via correlation analysis. Explicit mathematical models were derived. Global sensitivity analysis using the elementary effect test (EET) identified inlet airflow velocity as most influential. Uncertainty quantification via quantile regression (QR) confirmed the model’s reliability with narrow prediction intervals and high coverage probabilities. MGGP offers a favorable balance of accuracy, generalization, and interpretability compared to extreme gradient boosting (XGBoost) and multiple nonlinear regression (MNR). Its explicit form quantifies parameter interactions, enabling efficient on-site monitoring with low computational cost. This study provides an interpretable prediction tool for intelligent wet scrubber operation, supporting cleaner production and refined control in complex industrial processes.

Keywords:

wet scrubber; multi-parameter coupling; MGGP; interpretable prediction; dust removal efficiency

MSC:

76T30

1. Introduction

In mining, metallurgy, and other heavily polluting industries, high-concentration dust is continuously generated during ore crushing, screening, and material transfer processes, with its inhalable fraction often rich in pathogenic constituents such as free silica. Long-term occupational exposure not only leads to a high incidence of pneumoconiosis but also significantly increases the risk of cardiovascular and respiratory diseases [1,2,3]. Studies indicate that pneumoconiosis remains a prominent occupational health concern worldwide [4,5,6]. To address this serious challenge, occupational health and environmental regulations have become increasingly stringent both domestically and internationally, compelling enterprises to adopt efficient control technologies and achieve cleaner production. In this context, wet scrubbers have emerged as core equipment for achieving dust emission compliance due to their high efficiency in capturing moist, adhesive, and toxic dust [7,8,9]. However, current monitoring of their dust removal performance still lacks methods capable of real-time analysis of multi-parameter coupling mechanisms, making it difficult to meet the growing demands of health-oriented regulatory requirements and refined process control. Therefore, developing a performance prediction model that combines high accuracy with interpretability has become essential to advance wet scrubbers from passive treatment toward proactive prevention.

The dust removal efficiency of wet scrubbers is a core performance indicator that directly determines their environmental benefits and operational economy. To continuously improve this metric, existing research and practice have mainly progressed along two directions. First, reliance on computational fluid dynamics simulations and numerical analysis helps to reveal the complex internal gas–liquid–solid three-phase flow structures, particle trajectories, and dust removal mechanisms through simulation, thereby guiding design optimization and pre-optimization of operating parameters [10,11,12]. Second, efforts focus on structural innovation and integrated design of the equipment itself, such as developing multi-stage composite scrubbers that combine mechanisms like cyclonic separation, packing, and electrostatic precipitation, or optimizing key internal components such as flow guide plates and atomizing nozzles [13,14,15]. These aims are to enhance gas–liquid contact and broaden the efficient capture window for dust across multiple particle size ranges, especially fine particles such as PM_2.5 and PM₁. However, the ultimate effectiveness of both in-depth mechanistic studies and optimized structural designs heavily depends on accurate real-time evaluation and feedback of dust removal efficiency under actual operating conditions. In practice, the removal efficiency is profoundly influenced by strong nonlinear coupling among multiple parameters such as liquid level height, airflow velocity, dust concentration, and particle size distribution. For instance, high airflow velocity may lead to droplet entrainment losses, while the capture mechanisms for dust particles of different sizes, such as diffusion, interception, and inertial impaction, exhibit markedly different sensitivities to the same operating parameter [16,17,18,19]. This complexity makes it extremely difficult to construct universal and accurate real-time prediction models based purely on physical mechanisms. Compared with the active research on equipment improvement, the development of real-time monitoring and intelligent interpretation technologies for dust-removal efficiency has lagged significantly. Existing methods, whether direct approaches such as offline filter-membrane gravimetry [20] or online optical sensor monitoring [21], or inversion techniques based on pressure or image signals [22,23], all exhibit inherent limitations when applied to real-time condition monitoring under complex working conditions. The former cannot achieve continuous online monitoring or suffers from low accuracy due to moisture interference, while the latter is constrained by errors in obtaining indirect indicators, making it difficult to directly support real-time process optimization, control, or fault diagnosis. This bottleneck seriously restricts the improvement of intelligent operation and maintenance levels for wet scrubbers and undermines the enhancement of cleaner production capacity.

To break through the bottlenecks of existing monitoring methods in terms of real-time capability, accuracy, interpretability, and especially the explicit quantification of multi-parameter coupling relationships, it is imperative to introduce a new modeling paradigm capable of autonomously extracting physical mechanisms from operational data and producing transparent mathematical expressions. Symbolic regression, as an important branch of interpretable artificial intelligence, offers a promising path to solving such problems. It does not rely on pre-defined functional forms; instead, the algorithm automatically searches and combines mathematical operators to directly uncover the hidden mathematical relationships between input variables and the output target. The final outcome is an explicit, analytically tractable and differentiable formula [24]. Among the various symbolic regression methods, MGGP stands out due to its distinctive advantages. MGGP constructs the final prediction model by linearly combining multiple relatively simple genetic programming individuals (i.e., “genes”) with weighted coefficients. This structure enables it to retain the strong expressive power and interpretability of traditional genetic programming, while also effectively controlling the overall complexity through the combination of multiple relatively simple sub-models. As a result, when fitting high-dimensional nonlinear data, MGGP achieves a better balance among accuracy, simplicity, and generalization capability [25,26,27]. In recent years, MGGP has demonstrated its outstanding capability in constructing high-accuracy explicit models across multiple fields such as geotechnical parameter prediction, structural reliability analysis, and complex chemical process modeling [28,29,30,31]. This clearly indicates that the inherent strength of MGGP, extracting clear mathematical rules from complex data, aligns well with the core need in wet-scrubber performance monitoring to clarify the mechanisms underlying multi-parameter coupling effects, thereby providing a methodological foundation for this study.

Therefore, this work proposes a new performance prediction method for wet scrubbers based on MGGP, aiming to address the technical bottlenecks of difficulty in quantifying multi-parameter coupling and achieving real-time, accurate monitoring in efficiency assessment. First, through experimental and correlation analyses, the liquid-level height, inlet airflow velocity, system pressure, and inlet dust concentration are identified as the core input variables. Subsequently, MGGP is applied to construct explicit and interpretable prediction models for the dust-removal efficiency of four characteristic particle-size classes: PM₁, PM_2.5, PM₁₀, and TSP. Furthermore, single-factor analysis, the EET, and QR are integrated to systematically quantify variable sensitivity and prediction uncertainty. The model performance is further compared with methods such as XGBoost and MNR, comprehensively demonstrating the advantages of the proposed approach across multiple dimensions including prediction accuracy, generalization capability, and interpretability, thereby validating the model’s effectiveness and reliability. This study not only provides a directly deployable quantitative tool for the intelligent monitoring and optimized operation of wet scrubbers, but also offers an innovative methodological framework for constructing transparent digital twins of complex industrial processes.

2. Materials and Methods

2.1. Experimental System and Data Acquisition

The experimental data used in this study are derived from a published industrial field test conducted by the research group, which aimed to evaluate the cleaning performance of a gas–liquid entrainment wet scrubber for dust generated at the feed inlet of a cone crusher in an ore-crushing workshop [32]. The test was carried out in an ore-crushing workshop with a daily processing capacity of approximately 50,000 t. The scrubber was designed for a maximum handling air volume of 5000 m³/h. The overall test system mainly consists of the wet-scrubber unit, a semi-enclosed dust-collection hood, and a parameter-adjustment and data-acquisition unit. The core scrubber unit was connected to the dust-generation point at the feed inlet of the cone crusher, and dust diffusion was controlled by maintaining a negative pressure inside the collection hood. The system was equipped with precision instrumentation: the inlet air velocity (v) was controlled by adjusting the fan frequency and monitored with an anemometer (TSI-9545; TSI Incorporated, Shoreview, MN, USA; accuracy of ±0.015 m/s or ±3% of reading); the liquid-level height (h) was accurately set via an external water-supply system; the inlet and outlet dust mass concentrations were measured synchronously by a pair of laser-scattering dust-concentration sensors (SDS069; Nova Fitness Co., Ltd., Qingdao, Shandong, China; accuracy of ±10% of reading for PM₁, PM_2.5, and PM₁₀, and ±18% of reading for TSP) of the same brand and model, enabling accurate calculation of the removal efficiency (η). The dust-concentration sensors can simultaneously measure concentration data for four characteristic particle-size ranges: PM₁, PM_2.5, PM₁₀, and TSP; the system pressure (p) was acquired using high-frequency pressure sensors (LU–THG03P5M3U2; Xiamen Anthone Electronics Co., Ltd., Xiamen, Fujian, China; accuracy of ±0.5% FS) installed at the inlet and outlet [32], as shown in Figure 1 [32].

To collect scrubber operating data under all working conditions, the data acquisition method followed essentially the same procedure as in Wei et al., 2026 [32]. The initial liquid level was set at −60, −30, −15, 0, 15, 30, and 45 mm, respectively. For each initial liquid level, the fan frequency was increased stepwise from low to high, resulting in a total of 52 operating conditions. For each condition, the inlet air velocity, pressure drop, inlet dust concentration, and outlet dust concentration of the scrubber were measured simultaneously, with the aim of calculating the removal efficiency. The pressure sensors operated at 1024 Hz for 5 s per condition, yielding 5120 pressure measurements per condition for subsequent statistical analysis. To increase the experimental data volume and improve the accuracy of removal-efficiency prediction, the sampling frequency of the dust-concentration sensors was set to 0.1 Hz in this study, and the sampling duration for each condition was set to 180 s, yielding a total of 936 experimental sample sets.

2.2. Data Preprocessing Methods

(1) Removal Efficiency of the Wet Scrubber

The removal efficiency is the key performance metric for quantifying the dust-removal performance of a wet scrubber, defined as follows:

η = (1 - \frac{c_{1}}{c}) \times 100 %

(1)

where η is the removal efficiency (%), c is the inlet dust concentration (μg/m³), and c₁ is the outlet dust concentration (μg/m³).

(2) Statistical Features of the Wet Scrubber Pressure Signal

The statistical features of the wet scrubber pressure signal mainly include the mean pressure (p) and the standard deviation (Δp), which are calculated as shown in Equations (2) and (3), respectively:

p = \frac{1}{n} \sum_{i = 1}^{n} p_{i}

(2)

σ_{p} = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(p_{i} - p)}^{2}}

(3)

where p_i is the instantaneous pressure value, Pa, and n is the number of samples.

2.3. MGGP

Genetic programming is a machine learning method inspired by Darwin’s “survival of the fittest” mechanism of natural selection [24]. Compared with traditional statistical regression and other machine learning methods, Genetic programming does not require predefined functional forms for potential relationships among data (e.g., fixed linear or logarithmic relationships) nor rely on domain prior knowledge; instead, it can generate predictive equations directly from data characteristics [25]. MGGP is essentially an extended version of genetic programming that enhances its capability by linearly combining the outputs of individual genetic programming models using weighted coefficients calculated via the least-squares method [25]. Figure 2 illustrates the structure of an MGGP model comprising three genes and four input variables, where the final output is obtained by linearly combining these genes with weighted coefficients. Compared with conventional genetic programming, MGGP improves the ability to fit high-dimensional, nonlinear, and multivariable problems while preserving interpretability, making it effective for addressing complex nonlinear multivariable problems [26,27].

The core modeling process of MGGP is illustrated in Figure 3. First, an initial population is constructed based on randomly generated individuals containing multiple genes. Then, individuals with higher fitness are selected as parents according to a fitness function (usually RMSE). Subsequently, crossover and mutation operations are performed on the selected parent individuals to generate offspring with new gene combinations. Finally, the current iteration state is checked: if the fitness of an individual meets the preset threshold, the optimal individual is output; otherwise, the process from fitness evaluation to genetic operation is repeated until either the fitness threshold is reached or the maximum number of iterations is attained.

3. MGGP Modeling

3.1. Selection of Input Variables

3.1.1. Variable Correlation Analysis Method

In industrial applications, liquid-level height and fan frequency are key parameters for the operational control of wet scrubbers, while inlet airflow velocity and dust concentration are the primary process-dependent parameters during state regulation. However, the gas–liquid–dust three-phase system exhibits high fluidity, leading to exceptionally complex inter-phase coupling. Therefore, relying solely on the aforementioned key parameters is insufficient for achieving accurate prediction of the dust-removal efficiency. To address this limitation, this study further incorporates two statistical features, the central tendency and intensity of pressure fluctuations reflecting the gas–liquid two-phase behavior within the scrubber, as additional model input variables, thereby enabling a more comprehensive characterization of the gas–liquid–dust coupling process. It should be noted, however, that there exist inherent correlations among the parameters corresponding to these expanded input variables. Some variables may even interfere with the dust removal efficiency prediction model, thereby reducing its predictive accuracy. To mitigate such interference, this study first conducts correlation analysis between the input variables and the output variable (dust removal efficiency) in the training set to identify potential redundant variables. On this basis, the MGGP approach is further employed to exclude the influence of irrelevant variables, ultimately laying the groundwork for constructing a more robust prediction model for dust-removal efficiency.

(1) Linear correlation quantification

The Pearson correlation coefficient r is used to quantify the degree of linear correlation between any two variables X and Y. It is calculated as follows:

r_{X Y} = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} \sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(4)

where

\bar{X}

and

\bar{Y}

are the sample means of variables X and Y, respectively. r_XY ∈ [−1,1]; an absolute value closer to 1 indicates a stronger linear correlation, and the sign indicates the direction (positive or negative).

(2) Joint distribution analysis based on kernel density estimation

To move beyond the assumption of linear correlation and deeply uncover the complex dependency structures between variable pairs, this study introduced joint distribution analysis based on kernel density estimation. The kernel density estimation is a non-parametric method for estimating the probability density function. It does not presume that the data follow a specific distribution form but is entirely driven by the data itself, using smooth kernel functions to estimate the probability density of random variables. In bivariate analysis, the kernel density estimation can generate a continuous, smooth joint probability density surface, thereby accurately depicting the “co-occurrence” likelihood of two variables in a two-dimensional space. The computational principle of two-dimensional the kernel density estimation is as follows:

For a given set of n two-dimensional data points (X_i, Y_i), the estimated joint probability density

\hat{f}

_hX_,hY (x, y) at any point (x, y) is given by:

\hat{f} (x, y) = \frac{1}{n h_{X} h_{Y}} \sum_{i = 1}^{n} K (\frac{x - X_{i}}{h_{X}}) K (\frac{y - Y_{i}}{h_{Y}})

(5)

where K (·) is the kernel function. h_X and h_Y are the bandwidths for the X and Y variables, respectively.

Linear association: If a strong linear relationship exists between variables, the contours will appear as slender, densely packed ellipses. The direction of the major axis indicates the sign of the correlation (from bottom-left to top-right for positive, from top-left to bottom-right for negative). The flatter and more elongated the ellipse, the stronger the linear association.

Nonlinear association: When contours exhibit shapes such as curved “bananas,” “crescents,” or more complex forms, it strongly suggests a nonlinear dependency between the variables.

3.1.2. Input Variable Selection Result

The selection of input variables in this study follows a physics-informed modeling philosophy, where domain knowledge of wet scrubber operating mechanisms guides the data-driven feature selection process. This approach unfolds in three interconnected stages. First, based on established understanding of wet scrubber mechanisms, we identified candidate variables governing key physical processes: liquid level height (h) controls the generation of collecting elements (water films and droplets); inlet airflow velocity (v) determines particle transport and gas–liquid contact time; system pressure (p) reflects energy consumption and flow regime; inlet dust concentration (c) influences particle agglomeration and collection probability. Fan frequency (f) and pressure standard deviation (σ_p) were also considered as they relate to operational control and flow fluctuations. The system pressure reflects the baseline pressure level associated with energy consumption and overall flow resistance, while the standard deviation quantifies the intensity of pressure fluctuations and serves as an indicator of gas–liquid flow dynamics, including bubble formation, droplet entrainment, and turbulence intensity [17]. These physically motivated candidates were then subjected to correlation analysis to quantify their linear and nonlinear relationships with removal efficiency and to identify potential redundancies among variables. This statistical validation step ensures that the final selection is not only physically meaningful but also empirically justified. Finally, the optimal input set (Table 1) was determined by balancing statistical evidence with physical interpretability, ensuring that the models are built on mechanistically meaningful foundations while avoiding multicollinearity and information redundancy.

To validate this physically motivated candidate set and identify potential redundancies, we performed correlation analysis on the training data. Figure 3 presents a correlation and distribution matrix plot of all candidate input variables (h, f, v, c, p, σ_p) and the output variable (η). In this matrix, the diagonal shows histograms of individual variable distributions; the upper-triangular region displays contour plots of pairwise density distributions; the lower-triangular region combines scatter plots with Pearson correlation coefficients (r), quantifying the strength and direction of linear relationships for each variable pair.

This hybrid approach, physical reasoning first, statistical validation second, ensures that the final input set is both mechanistically meaningful and statistically robust. Based on the correlation analysis and subsequent MGGP modeling tests, the combination of liquid level height, inlet airflow velocity, system pressure, and inlet dust concentration was determined as the optimal input set for predicting removal efficiency.

As shown in Figure 4, the liquid level height exhibits a significant positive correlation with removal efficiency (r = +0.53). This reflects the physical role of liquid level in generating collecting elements: a higher liquid level increases the probability of dust particle capture through enhanced gas–liquid contact. The fan frequency shows a negative correlation with efficiency (r = −0.26), which can be interpreted as the trade-off between airflow rate and residence time, excessive fan frequency increases velocity but reduces the time available for particle collection. Notably, the direct correlation between inlet airflow velocity (v) and efficiency is weak (r = −0.05). However, velocity shows strong correlations with mean pressure (r = −0.71) and fan frequency (r = +0.88), indicating that it influences efficiency primarily through interactions with other parameters rather than as an independent driver. This is consistent with the complex gas–liquid–dust dynamics in wet scrubbers, where velocity modulates droplet generation and particle transport in conjunction with liquid level and pressure. The inlet dust concentration (c) exhibits a moderate positive correlation with efficiency (r = +0.25), aligning with the physical phenomenon of concentration-induced agglomeration: higher dust loads promote particle-particle collisions and subsequent capture. Its relatively low collinearity with other variables suggests it provides independent information to the model. In contrast, the pressure standard deviation (σ_p) shows a very weak correlation with efficiency (r = −0.05) and substantial overlap with mean pressure, making it a redundant variable. Kernel density estimation further reveals that h, c, f, and v exhibit strong nonlinear correlations with efficiency, confirming the need for nonlinear modeling approaches like MGGP. These physical interpretations, combined with the statistical evidence, guided our final variable selection. Based on subsequent MGGP modeling tests, the combination of liquid level height, inlet airflow velocity, system pressure, and inlet dust concentration was determined as the optimal input set, capturing both direct effects and key interactions while avoiding multicollinearity.

To further verify the rationality of the variable combination, taking PM_2.5 as an example, modeling tests were conducted using the MGGP method, and the results are presented in Table 1. When the input variables were the liquid-level height, inlet airflow velocity, inlet dust concentration, and mean pressure, the model performance reached its optimum: the training set achieved R² = 0.94738 and RMSE = 2.45880, while the test set yielded R² = 0.91424 and RMSE = 2.48690. Overall, the combination of the liquid-level height, inlet airflow velocity, inlet dust concentration, and mean pressure not only effectively avoids interference from highly collinear variables but also integrates synergistic information from both direct and indirect effects. Verified through MGGP, this combination demonstrates the best fitting accuracy and generalization ability, and is therefore determined as the final set of input variables for predicting the removal efficiency.

3.2. Modeling Data

Table 2 provides the statistical characteristics of the input and output variables, including the minimum, mean, standard deviation, and maximum values of each variable in both the training and testing sets.

3.3. MGGP Parameter Configuration

During the construction of the MGGP model, excessive pursuit of a complex model structure to achieve perfect fitting of the training data can easily lead to a significant increase in overfitting risk, resulting in degraded generalization performance on test data. To mitigate the risk of overfitting, several measures were incorporated into the MGGP model configuration. First, the maximum tree depth was limited to 4, preventing the evolution of overly complex expressions that might fit noise in the training data. Second, the number of genes was restricted to a range of 10–12, decreasing as the particle size increased, which balances model flexibility with parsimony. Third, the use of a validation strategy, where the testing set remains unseen during training, ensures that the reported generalization performance is unbiased. Additionally, the termination criterion and the elitism fraction help maintain high-fitness individuals while preventing premature convergence to spurious solutions. These combined strategies ensure that the MGGP models capture the underlying physical relationships rather than memorizing training data artifacts.

3.4. MGGP Modeling Results

The collected 936 data sets were randomly split into a training set (748 sets) and a testing set (188 sets) at an 8:2 ratio. It is important to emphasize that the testing set (188 samples) was strictly held out throughout the entire model development process. It was used only once at the final stage to evaluate the generalization performance of the trained models. No information from the testing set was used during parameter selection, model training, or evolutionary optimization. The parameter configuration (Table 3) was determined based on built-in evolutionary stopping criteria and preliminary validation on subsets of the training data, ensuring that the testing set remains an unbiased benchmark for final performance assessment. The MGGP model underwent 500 generations of evolution. Finally, mathematical models describing the relationship between the removal efficiency and the variables the liquid-level height, inlet airflow velocity, inlet dust concentration, and mean pressure were constructed for different particle size fractions, with the specific expressions given by Equations (6), (7), (8), and (9), respectively.

\begin{matrix} η_{PM 1} & = 3.40 \times 10^{- 4} v^{2} (2 h - p) + 4.08 \cos (h p) - 4.08 \sin (p) + 10.95 \cos (\cos (h^{2} p)) \\ - 2.08 (c + p \cos (v - p) - p^{2}) / p + 3.86 \cos (3 h v) - 1.25 {(v + 2 h / p)}^{2} \\ + 3.97 \cos (h^{2} + 2 h v) + 0.52 (\sin (7.812 p) + 4 p + c + h + \cos (c)) \\ + 0.13 p (h + c) / (2 h + v p) + 595.21 (7.789 / (v + p - 6.093)) \\ + 3.86 \cos (h + 3 v) + 0.07 (2 v + c - \sin (c)) (v - 7.874 / p + 1) + 116.85 \end{matrix}

(6)

\begin{matrix} η_{PM 2.5} & = 0.036 h \cos (p) (\cos (p) + 1.031) - 0.103 (p^{2} - p + h \cos (v)) \\ - 6.871 (v + p + c - \cos (v)) / (v^{2} - p) + 0.083 (v^{2} + p^{2} + h) \\ + 1.60 \times 10^{- 6} h v^{2} p \cos (h) - 3.38 \times 10^{- 3} \cos (h) (p - 10.9604 h) \\ - 1.96 \cos (h + v + 2 p) - 0.02 (v^{3} - p^{2} + 3 p) \\ - 4.04 \times 10^{- 6} h \cos (v) (v + 2 \cos (p)) (- h^{2} + 2 p + c) \\ - 2.68 \times 10^{- 4} v^{2} (2 h - 2 v - c) - 2.58 \times 10^{- 3} (h^{2} - v^{3}) + 110.71 \end{matrix}

(7)

\begin{matrix} η_{PM 10} & = - 5.13 \times 10^{- 5} h v (2 h + v + \sin (c)) + 0.91 \cos^{2} (h^{2} v) \\ + 3.57 \times 10^{- 3} (4 h + 2 v + 2 \cos (v) + h v + 2 \sqrt{c}) \\ + 0.11 (\sin (\sin (h))) / (\sin (v + 2 p)) - 421.29 (2 h + 2 p - c) / p^{3} \\ - 0.04 (h + v + \sin (p + c)) + 5.65 \times 10^{- 6} v (h + 2.382) (2 h + v) / \cos (h v p) \\ - 1.61 \times 10^{- 5} h (h + v) / \sin (v + 2 p) + 5.82 \times 10^{- 5} (5 h + p + 1.4726) / \cos (h^{2} v) \\ + 1.80 \times 10^{- 5} h v (h + 1.513) / \cos (h) + 98.41 \end{matrix}

(8)

\begin{matrix} η_{TSP} & = 9.63 \times 10^{- 6} h^{2} v / \cos (h) + 3.00 \times 10^{- 3} h^{2} \cos (v) / (h + v + v^{p}) \\ + 695.53 h v / (p^{2} + {8.525}^{v}) - 44.67 v / (h + p + \cos (c) + v^{p} + 8.468) \\ - 3.07 \times 10^{- 3} (h^{2} - \cos (h) - \cos (p) (h + v)) \\ + 2.87 \times 10^{- 3} h^{2} (2 p + c) / (h^{2} + h + c + 8.468) \\ + 9.32 \times 10^{- 5} h v \cos (p) / \cos (h^{2} v) - 1.43 \times 10^{- 5} (h^{2} v + 8 p - c) \\ + 0.01 (\cos (h) + h (\cos (h) + 1)) + 5.88 h^{3} v / p^{3} + 99.18 \end{matrix}

(9)

Figure 5, Figure 6, Figure 7 and Figure 8 present scatter plots of the predicted versus measured removal efficiency for PM₁, PM_2.5, PM₁₀, and TSP, respectively, with subfigures (a) and (b) corresponding to the training and testing datasets. The red dashed line indicates the ideal 1:1 line. Overall, the data points for all particle size fractions cluster closely around this line, indicating that the MGGP models capture the underlying trends accurately with no significant systematic bias. Beyond the statistical metrics, several physically meaningful patterns emerge from these figures. First, a clear particle size dependency is observed: the prediction accuracy (R²) improves and the scatter decreases as particle size increases from PM₁ to TSP. For PM₁ (Figure 5), the points are more scattered, with R² values of 0.9305 (training) and 0.9179 (testing), the lowest among all fractions—and RMSE values of 4.60% and 4.43%, the highest. This reflects the inherent difficulty in predicting ultrafine particle collection, as PM₁ and PM_2.5 are primarily captured by diffusion and are highly sensitive to local flow fluctuations and complex gas–liquid interactions (Figure 6). In contrast, as particle size increases to PM₁₀ and TSP (Figure 7 and Figure 8), the scatter diminishes markedly: R² rises to 0.9662 (PM₁₀ training) and RMSE drops to 0.70% (PM₁₀ training) and 0.41% (TSP training). This trend is consistent with physical expectations: larger particles are dominated by inertial impaction and interception, which are more deterministic and less sensitive to operating variations, making them easier to predict accurately. Second, the dispersion patterns vary with efficiency levels. Increased scatter is observed in the lower efficiency range (e.g., below 60% for PM₁ and below 80% for PM_2.5). This corresponds to suboptimal operating conditions where the scrubber is not fully effective, physically, these may involve insufficient liquid coverage, excessive airflow leading to droplet entrainment, or very low dust concentrations where agglomeration effects are negligible. Under such conditions, small perturbations in operating parameters can cause proportionally larger changes in efficiency, contributing to higher prediction uncertainty. In contrast, at high efficiency levels (above 90% for all fractions), the predictions become highly consistent, with points tightly clustered near the 1:1 line. This indicates that the model is most reliable in the near-optimal operating region, precisely where industrial applications demand accurate predictions for process control and compliance monitoring.

Importantly, the performance metrics on the testing set are very close to those on the training set for all fractions, with R² differences of only 0.0126–0.0251 and the RMSE on the testing set is slightly lower than that on the training set. This indicates excellent generalization capability and no overfitting, confirming that the MGGP models can reliably predict removal efficiency under unseen operating conditions. From an industrial perspective, the achieved prediction accuracy (R² > 0.91, RMSE < 4.5% for even the most challenging PM₁ fraction) is sufficient for real-time monitoring and control purposes. The explicit mathematical form of the models (Equations (6)–(9)) allows for straightforward implementation in programmable logic controllers (PLCs) without the need for complex computational resources, enabling operators to receive instant feedback on expected performance and make proactive adjustments to maintain high efficiency and reduce energy consumption.

These observed patterns directly support the core propositions of this study. First, the clear particle size dependency in prediction accuracy validates that the MGGP models successfully capture the underlying shift in collection physics, from complex, diffusion-dominated mechanisms for fine particles to more deterministic inertial impaction for coarse particles. Second, the stability of predictions in the high-efficiency region (>90%) confirms the model’s suitability for industrial applications, where maintaining near-optimal performance is the primary operational goal. Third, the increased scatter under suboptimal conditions physically aligns with the higher sensitivity of scrubber performance to operating fluctuations, demonstrating that the model’s behavior is not merely statistically accurate but also physically consistent.

3.5. Interpretability of the MGGP Models

A key advantage of the MGGP framework lies in its inherent interpretability. Unlike black-box models such as neural networks or ensemble methods, MGGP produces explicit mathematical expressions (Equations (6)–(9)) that can be directly examined, differentiated, and analyzed. These formulas reveal the functional forms of variable interactions and allow domain experts to assess whether the model aligns with known physical principles. For instance, the presence of nonlinear terms such as

\sqrt h

and v², or interaction terms like h·p and

\sqrt h

× ln(c) in the derived models, provides direct insight into how liquid level height, airflow velocity, and system pressure jointly influence removal efficiency. This transparency is essential for building trust in the model and for enabling its use in real-time control, fault diagnosis, and process optimization.

4. Discussion of Model Performance

4.1. Single Factor Analysis

4.1.1. Parameter Configuration

Single-factor analysis serves as a critical bridge between the data-driven MGGP models and the underlying physical mechanisms. Due to the complex nonlinear structure of MGGP models, interactions among different variables make it challenging to delineate their individual impacts on the output. Single-factor analysis addresses this by isolating the effect of one input variable at a time while holding others constant at their mean values, thereby enabling precise identification and quantification of each variable’s contribution and influence pattern. This approach serves two essential purposes: first, it visually validates whether the model’s predictions align with established physical principles of wet scrubber operation; second, it verifies the scientific basis of model construction and the credibility of the prediction results. Thus, single-factor analysis not only enhances model interpretability but also constitutes a key step in the evaluation of MGGP models.

This study separately analyzed the influence of the three controllable variables, the liquid-level height, inlet airflow velocity, and inlet dust concentration, on the removal efficiency. During the analysis, while one of the inlet dust concentrations was varied over its monitored range, the other two variables were held constant at their respective mean values, as listed in Table 2. Notably, when examining the effect of the inlet airflow velocity on the removal efficiency, three different liquid level heights were considered: h = −20 mm, h = −5.68 mm (the mean value), and h = 10 mm. Furthermore, the mean pressure exhibits multicollinearity with the inlet airflow velocity and is jointly influenced by both the liquid-level height and inlet airflow velocity, a fitting analysis was conducted to establish the relationship between mean pressure and the pair (h, v). The fitting result is presented in Figure 9, and the corresponding fitting formula is given by Equation (10). Based on this formula, the mean pressure value corresponding to any given the liquid-level height and inlet airflow velocity can be calculated.

p = - 0.035 h^{2} - 1.894 v^{2} - 0.058 h v - 10.494 h - 57.379 v - 201.075

(10)

4.1.2. Effect of Liquid Level Height on Removal Efficiency

Figure 10 illustrates the variation of dust removal efficiency with liquid level height. As shown in Figure 10, the removal efficiency for dust particles of different size fractions exhibited an initial increase followed by a decrease as the liquid-level height rose, reaching its maximum when the liquid level was around 0 mm. This result was consistent with previous experimental findings [32].

4.1.3. Effect of Airflow Velocity on Removal Efficiency

Figure 11 shows the variation of dust removal efficiency with airflow velocity. Under different liquid level heights, the removal efficiency of PM₁ exhibited an initial increase followed by a decrease as the airflow velocity increased (Figure 11a). Moreover, the influence of airflow velocity on the removal efficiency was relatively pronounced, and this effect became even stronger as the liquid level height decreased. This can be attributed to the smaller particle size of PM₁, which allows it to follow the airflow more readily, making it more sensitive to changes in airflow velocity. The variation pattern of PM_2.5 removal efficiency with airflow velocity was similar to that of PM₁ (Figure 11b), also exhibiting an initial increase followed by a decrease as the airflow velocity increased. However, a difference in the degree of influence was observed: with the increase in particle size fractions, the impact of airflow velocity on the removal efficiency of PM_2.5 was somewhat attenuated. The variation of removal efficiency for PM₁₀ and TSP with airflow velocity showed a high degree of correlation with the liquid level height (Figure 11c,d). At a liquid level height of −20 mm, the removal efficiency initially increased and then decreased as the airflow velocity increased, whereas under the other two liquid level conditions, the removal efficiency remained relatively stable. In summary, the influence of airflow velocity on the removal efficiency was significantly associated with the liquid level height. These two parameters did not act independently on the dust removal process; rather, they jointly regulated the removal efficiency through synergistic or antagonistic effects.

4.1.4. Effect of Inlet Dust Concentration on Removal Efficiency

Figure 12 depicts the variation of dust removal efficiency with inlet dust concentration. In general, the removal efficiency increased as the inlet dust concentration increased, but the rate and magnitude of this increase differed among particle size fractions. As shown in Figure 12a,b, the removal efficiencies of PM₁ and PM_2.5 increased significantly with rising inlet dust concentration, with both exceeding a 10% increase, and PM₁ increasing more rapidly than PM_2.5. In contrast, the removal efficiencies of PM₁₀ and TSP increased only slightly with increasing inlet dust concentration, with growth rates remaining below 0.6%, significantly lower than those observed for finer particles (Figure 12c,d).

4.2. Sensitivity Analysis

4.2.1. The Elementary Effect Test

To assess the influence of input variables on the MGGP modeling results, sensitivity analysis is a critical step after establishing a mathematical model. For the dust removal efficiency of wet scrubbers, the output is governed by the coupled effects of multiple input variables such as liquid level height, airflow velocity, and dust concentration, resulting in distinctly nonlinear behavior. The EET simultaneously quantifies both the main effects and the interactive effects of variables by computing importance and interaction indices [33], and it is well-suited to the nonlinear output characteristics of MGGP model. Therefore, this study employs the EET to conduct a global sensitivity analysis of the dust removal efficiency predicted by the MGGP model. The main steps of the EET are as follows:

First, Latin hypercube sampling is used to generate m random sample points for each input variable, ensuring uniform distribution of samples across each variable dimension. Next, following a “one-factor-at-a-time” approach, a small perturbation is applied to each input variable while keeping the other variables fixed, and the output change before and after the perturbation is calculated. Finally, based on the output changes, the elementary effect for each variable is computed and further aggregated to obtain the importance and interaction indices. The calculation of the EET follows Equations (11)–(13):

E E_{j i} = \frac{η (x_{j i} + Δ x_{i}) - η (x_{j i})}{Δ x_{i}}

(11)

where, EE_ji is the elementary effect of the i-th variable at the j-th sampling point; x_ji is the value of the i-th variable at the j-th sampling point; and Δx_i is the perturbation step size of the i-th variable, taken as 5% of its range in this study.

μ_{i} = \frac{1}{r} \sum_{j = 1}^{m} | E E_{j i} |

(12)

where, the importance index μ reflects the degree of influence of an input variable on the output; m was set to 100 in this study.

σ_{i} = \sqrt{\frac{1}{r - 1} \sum_{j = 1}^{m} {(E E_{j i} - {\bar{E E}}_{i})}^{2}}, {\bar{E E}}_{i} = \frac{1}{r} \sum_{j = 1}^{m} E E_{j i}

(13)

where, the interaction index σ quantifies the strength of interaction between a variable and the other variables.

4.2.2. Quantification of Variable Relative Importance

To quantify the relative importance of each input variable on the dust removal efficiency, the absolute mean elementary effect was employed as the evaluation metric. Normalization was further applied to determine the relative influence degree of each variable (ω), calculated as follows:

ω_{i} = \frac{μ_{i}}{\sum_{k = 1}^{m} μ_{k}}

(14)

where m is the total number of variables.

4.2.3. Sensitivity of Dust Removal Efficiency to Different Variables

Figure 13 summarizes the importance and interaction indices of the different input variables. The ranking of variable importance varied to some extent with the dust particle size fractions (Figure 13a). For PM₁, the order of importance for the three input variables was inlet airflow velocity > inlet dust concentration > liquid-level height. For the larger particle sizes, PM_2.5, PM₁₀, and TSP, the order was inlet airflow velocity > liquid-level height > inlet dust concentration. Furthermore, an examination of the changes in the relative importance of key variables revealed that the sensitivity of inlet airflow velocity to removal efficiency exhibited a continuous increase with increasing dust particle size fractions, rising from 69.03% for PM₁ to 87.34% for TSP. This indicates that inlet airflow velocity is a critical parameter determining removal efficiency. In contrast, the sensitivity of inlet dust concentration to removal efficiency showed the opposite trend. This is primarily because smaller particles are more prone to agglomeration, leading to an increase in removal efficiency as the inlet dust concentration rises. The sensitivity of liquid-level height to the removal efficiency of different particle size fractions remained relatively stable, consistently ranging between 11.22% and 13.19%, without showing a discernible trend related to dust size fractions.

In terms of the interaction index, the strength of interaction among the input variables showed a positive correlation with their importance. This pattern was observed across the analyses for different particle size fractions. Among all variables, inlet airflow velocity exhibited the strongest interaction with the other variables, significantly surpassing the interaction levels of the remaining parameters. This is primarily because airflow velocity directly determines the flow regime inside the scrubber, influencing the dispersion, transport, and deposition processes of dust particles, while simultaneously affecting the generation and dynamic state of the collection elements. When the airflow velocity is too low, the flow fails to produce sufficient or effective collection elements, leaving a portion of the dust inadequately captured. Conversely, excessively high airflow velocity reduces the contact time between dust particles and the collection elements, also leading to a decline in removal efficiency. For the other variables, except for PM₁, the interaction strength of liquid-level height with the other variables ranked second only to that of inlet airflow velocity (Figure 13c–e). This is because the liquid level plays a crucial role in the generation of collection elements and the morphology of liquid flow. For PM₁, however (Figure 13b), the interaction pattern exhibited distinct characteristics: the interaction between inlet dust concentration and the other variables was stronger than that of liquid-level height. This can be attributed to the fact that fine particles are more susceptible to concentration induced agglomeration, which subsequently alters their synergistic relationships with other variables.

4.3. Uncertainty Analysis

4.3.1. Quantile Regression

In predictive modeling, uncertainty analysis is used to measure the reliability and limitations of model predictions. Its core objective is to identify and quantify the impact of various sources of uncertainty on the prediction results, thereby delineating the model’s applicable boundaries. Common methods include Monte Carlo simulation [34], Bayesian approaches [35], and QR [36]. Unlike the former two, QR does not require assumptions about the data distribution and can directly estimate the prediction intervals for any quantile. Consequently, it demonstrates greater robustness in scenarios involving non-normality or heteroscedasticity. This study employs the QR method to conduct an uncertainty analysis of the dust removal efficiency prediction model. The fundamental assumption of this method is a linear relationship between the measured and predicted values of removal efficiency. The predicted removal efficiency values output by the MGGP model serve as the independent variable, while the experimentally measured removal efficiency values act as the dependent variable. Based on this, by fitting selected quantiles (the 0.05 and 0.95 quantiles), the lower bound (L_i) and upper bound (U_i) of the prediction interval corresponding to each sample are obtained. The mathematical model is shown in Equation (15):

\{\begin{matrix} L_{i} = α_{1} η_{p i} + β_{1} \\ U_{i} = α_{2} η_{p i} + β_{2} \end{matrix}

(15)

where, η_pi is the point-predicted value; L_i and U_i are the lower and upper bounds of the interval, respectively; α₁ and β₁ are the coefficient and intercept corresponding to the 0.05 quantile; α₂ and β₂ are the coefficient and intercept corresponding to the 0.09 quantile.

4.3.2. MPI and PICP

To quantitatively evaluate the uncertainty of the model, two key evaluation metrics were introduced: the mean prediction interval width (MPI), which measures the overall width of the prediction intervals. The smaller the value, the lower the dispersion and the weaker the uncertainty of the prediction results. The prediction interval coverage probability (PICP), which assesses the proportion of actual values falling within the intervals, the closer the PICP is to the preset confidence level (90%), the higher the reliability of the prediction intervals and the better the control of model uncertainty. The specific calculation formulas for these two metrics are given by Equations (16) and (17):

M P I = \frac{1}{N} \sum_{i = 1}^{N} (U_{i} - L_{i})

(16)

P I C P = \frac{1}{N} \sum_{i = 1}^{N} I_{i}, I_{i} = \{\begin{matrix} 1, & L_{i} < η < U_{i} \\ 0, & o t h e r w i s e \end{matrix}

(17)

where N is the number of sample points.

4.3.3. QR Prediction Analysis

Figure 14 presents the prediction interval boundaries for the removal efficiency of dust particles with different sizes, derived using Quantile Regression (QR). The results show that for all particle sizes, the slopes of the upper boundaries range from 0.76 to 0.84, while those of the lower boundaries range from 1.07 to 1.23. The marked difference in slopes between the upper and lower bounds indicates an asymmetric trend in the prediction intervals. As the predicted efficiency increases, the upper boundary rises slowly, whereas the lower boundary increases more sharply, resulting in a progressive narrowing of the prediction interval width. This structural characteristic reflects a contraction in prediction error under high-efficiency conditions, suggesting improved model stability in the high-value region.

By substituting the predicted dust removal efficiency values into the lower and upper bounds obtained from Figure 14, the 90% confidence-level prediction intervals shown in Figure 15, Figure 16, Figure 17 and Figure 18 were obtained. These figures present the predicted values, measured values, and the corresponding 90% prediction intervals of the removal efficiency. It can be observed that the majority of the measured values fall within the 90% prediction intervals, with only a small portion lying outside. Moreover, the width of the 90% prediction intervals increases as the dust particle size fraction decreases; however, the interval width remains acceptable for practical engineering applications. In summary, the constructed prediction model is able to provide reasonable interval forecasts of dust removal efficiency in most cases, demonstrating its reliability to a certain extent. It should be noted that the dataset used for this uncertainty analysis is identical to that described in Section 3.4, where the 936 experimental samples were randomly divided into a training set (748 samples) and a testing set (188 samples) at an 8:2 ratio. The quantile regression models were fitted using the training set, and the resulting prediction intervals were evaluated on both training and testing sets to ensure consistency and reliability.

4.3.4. Statistical Analysis of MPI and PICP

Figure 19 displays the statistical results of MPI and PICP. As shown in Figure 19a, for PM₁, the MPI values were relatively large, reaching 15.32% for the training set and 14.92% for the test set. This is because the removal efficiency of fine particles is more sensitive to operating parameters, leading to much greater fluctuation in efficiency compared with larger particles, which significantly increases the difficulty of model prediction. Nevertheless, these values generally fall within an acceptable range. For PM_2.5, the MPI decreased by more than 50%, dropping to 7.01% for the training set and 6.78% for the test set. For PM₁₀ and TSP, the MPI decreased to below 1.52%. The PICP values for all particle sizes were close to 0.9, nearly aligning with the preset confidence level of 90% (Figure 19b). This indicates excellent reliability of the prediction intervals, as most measured values of removal efficiency fell within or near the 90% prediction intervals. In summary, the MGGP model exhibits relatively low uncertainty in predicting the dust removal efficiency of wet scrubbers.

4.4. Comparative Analysis

To evaluate the applicability of the MGGP models in predicting dust removal efficiency, this study adopted R² and RMSE as evaluation metrics and compared its performance with the XGBoost model and the MNR model. As shown in Table 3, on the training set, the XGBoost model exhibited the strongest fitting ability, achieving R² values above 0.98051 for all particle size fractions, with RMSE ranging from 0.21522 to 2.43822. The MGGP models performed slightly lower, with R² values ranging from 0.93049 to 0.96624, and its RMSE, although higher than that of XGBoost, was significantly lower than that of the MNR model (which generally had R² below 0.87564 and RMSE as high as 6.70736).

The test-set results further indicated that the XGBoost model exhibited good generalization capability, with R² decreasing by only 0.01344 to 0.05660 compared with the training set, showing no obvious overfitting. The MGGP model displayed an even smaller reduction in R² on the test set (0.01257 to 0.02510), demonstrating superior generalization stability. In contrast, for the prediction of PM₁₀ and TSP, the MNR model showed a difference in R² between the training and test sets exceeding 0.12, indicating markedly insufficient robustness. Overall, XGBoost performed best in terms of prediction accuracy, while the MNR model was the weakest overall. Although the accuracy of the MGGP model was slightly lower than that of XGBoost, it offered better generalization ability and could output explicit mathematical expressions, giving it unique advantages in interpretability and engineering applicability. Thus, it is more suitable for real-time optimization and control of dust-removal systems.

Beyond these global metrics, several additional dimensions merit consideration when comparing the three approaches:

Generalization stability. The ΔR² (training R² minus testing R²) provides a direct measure of generalization capability. MGGP achieves the smallest ΔR² across all particle sizes (0.0126–0.0251), indicating that its performance degrades minimally when applied to unseen data. XGBoost shows slightly larger ΔR² values (0.0134–0.0566), while MNR exhibits the largest degradation (0.0511–0.1380), confirming its limited suitability for this nonlinear problem.

Interpretability. The three models differ fundamentally in their transparency. MGGP produces explicit mathematical expressions (Equations (6)–(9)) that can be directly examined, differentiated, and analyzed to understand variable interactions and physical dependencies. XGBoost, despite its high accuracy, operates as a black box—while feature importance scores can be extracted, the exact functional relationships between inputs and outputs remain opaque. MNR provides full transparency through its parametric form, but its inability to capture the inherent nonlinearities in the data limits its practical utility. Thus, MGGP uniquely combines accuracy with interpretability.

Error distribution characteristics. Although detailed error distribution data are not presented here, the RMSE values in Table 4 offer indirect insight. For PM₁, the most challenging fraction, MGGP achieves a test-set RMSE of 4.42530, compared to 3.67202 for XGBoost and 6.88391 for MNR. The smaller gap between MGGP and XGBoost on this difficult task, coupled with MGGP’s superior interpretability, suggests that the trade-off between accuracy and transparency is acceptable for industrial applications where understanding the underlying mechanisms is valuable.

Practical deployability. For real-time monitoring and control, the computational cost of model deployment is critical. MGGP’s explicit formulas require only simple arithmetic operations, making them easily implementable in programmable logic controllers (PLCs) with minimal computational resources. XGBoost, while offering high accuracy, requires tree ensemble traversal and specialized software libraries, which may increase deployment complexity. MNR, though computationally simple, lacks the accuracy needed for reliable control.

In summary, XGBoost performs best in terms of predictive accuracy, while the MNR model is the weakest overall. Although the accuracy of the MGGP model is slightly lower than that of XGBoost, it offers better generalization stability and can output explicit mathematical expressions, giving it unique advantages in interpretability and engineering applicability. Thus, it is more suitable for real-time optimization and control of dust-removal systems.

It should be noted that the comparative analysis in this study focused on XGBoost and MNR as representative benchmarks. XGBoost was selected as a state-of-the-art black-box model to illustrate the trade-off between accuracy and interpretability, while MNR served as a traditional statistical baseline. A wider range of models, including support vector regression (SVR), random forests, Gaussian processes, and deep learning approaches such as convolutional neural networks (CNN), could potentially be applied to this problem. However, given the tabular nature and moderate size of the dataset (936 samples), many of these methods are either prone to overfitting (deep learning) or offer limited additional insight beyond the selected comparators. Moreover, the primary contribution of this work lies in the interpretability of the MGGP framework rather than in achieving the absolute highest predictive accuracy. Nevertheless, future studies should consider a more comprehensive benchmarking analysis to further validate the competitiveness of MGGP against a broader spectrum of machine learning algorithms, including emerging AutoML approaches.

5. Limitations and Future Directions

Despite the promising results demonstrated in this study, several limitations should be acknowledged, which also open avenues for future research. First, the experimental data used in this study were collected from a single wet scrubber installed at a specific industrial site (an ore-crushing workshop). While the operating conditions covered a wide range, the generalizability of the MGGP models to other scrubber designs, scales, or dust types remains to be validated. Future work should involve cross-validation using data from multiple scrubber systems or different industrial processes to assess the transferability of the proposed modeling framework. Second, the input variables considered in this study, liquid level height, inlet airflow velocity, system pressure, and inlet dust concentration, were selected based on correlation analysis and domain knowledge. However, other potentially influential factors, such as dust particle size distribution (beyond the four fractions considered), gas temperature and humidity, and liquid properties (e.g., surface tension, viscosity), were not included due to limitations in the experimental setup. Incorporating these variables could further improve model accuracy and physical comprehensiveness. Third, although the MGGP models exhibit strong generalization within the tested parameter ranges, their extrapolation capability beyond these ranges has not been systematically evaluated. The models should be used with caution when predicting performance under extreme operating conditions not covered by the training data. Future studies could explore hybrid approaches that combine MGGP with first-principles models to enhance extrapolation reliability. Fourth, while the MGGP models incorporate structural complexity constraints and built-in evolutionary stopping to prevent overfitting, and while single-factor analysis confirms physical consistency with experimental trends, a formally partitioned validation set was not employed for model selection. Future work with larger datasets may adopt a three-way split (training/validation/testing) to further enhance methodological rigor. Finally, while the interpretability of MGGP is a key strength, the derived mathematical expressions are still relatively complex for on-site operators. Future research could focus on developing simplified empirical correlations or visualization tools that distill the essential insights into easily understandable guidelines for industrial practitioners. Additionally, integrating the MGGP models into a closed-loop control system for real-time optimization of scrubber performance would be a practical next step toward intelligent industrial automation.

6. Conclusions

This study developed a novel interpretable modeling framework based on MGGP for predicting the real-time removal efficiency of wet scrubbers. The model was constructed using liquid-level height, inlet airflow velocity, system pressure, and inlet dust concentration as core input variables. A comprehensive performance evaluation and applicability analysis were conducted, leading to the following key conclusions:

(1) Single-factor analysis validated the model’s alignment with physical mechanisms. Removal efficiency exhibited a non-monotonic relationship with liquid-level height, peaking near 0 mm. The effect of airflow velocity was coupled with liquid level and particle size fraction, while efficiency increased with inlet dust concentration, especially for finer dust.

(2) Global sensitivity analysis revealed that inlet airflow velocity was consistently the most influential parameter across all particle size fractions. The importance ranking of liquid-level height and inlet dust concentration varied with particle size fraction, and interaction strength was strongly correlated with parameter importance.

(3) The uncertainty analysis based on QR demonstrated that the prediction intervals for PM₁ were relatively wide due to its higher operational sensitivity. For PM_2.5, PM₁₀, and TSP, the MPI values were below 7.01%. The PICP values for all particle size fractions were close to 0.9, aligning well with the preset 90% confidence level, indicating that the model predictions possess acceptable reliability with relatively low uncertainty.

(4) The MGGP model achieved a favorable balance between accuracy, generalization capability, and interpretability. It is comparable in accuracy to XGBoost while providing explicit mathematical expressions that directly quantify parameter coupling, thereby enabling low-computational-cost real-time monitoring and process insight.

Author Contributions

Conceptualization, L.Z. and R.Z.; methodology, T.W. and R.Z.; validation, L.Z., T.W. and R.Z.; investigation, J.Z.; resources, X.L.; data curation, H.L.; writing—original draft preparation, L.Z. and R.Z.; writing—review and editing, T.W.; funding acquisition, T.W. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the basic research funding of China Academy of Safety Science and Technology, grant number: 2025JBKY15 and the National Natural Science Foundation of China, grant number: 52374236.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MGGP	Multi-Gene Genetic Programming
EET	Elementary Effect Test
QR	Quantile Regression
XGBoost	Extreme Gradient Boosting
MNR	Multiple Nonlinear Regression
PM₁	Particulate Matter with aerodynamic diameter ≤ 1 μm
PM_2.5	Particulate Matter with aerodynamic diameter ≤ 2.5 μm
PM₁₀	Particulate Matter with aerodynamic diameter ≤ 10 μm
TSP	Total Suspended Particulate
RMSE	Root Mean Square Error
R²	Coefficient of Determination
MPI	Mean Prediction Interval
PICP	Prediction Interval Coverage Probability
PLC	Programmable Logic Controller

References

Li, Z.; Liu, Y.; Lu, T.; Peng, S.; Liu, F.; Sun, J.; Xiang, H. Acute effect of fine particulate matter on blood pressure, heart rate and related inflammation biomarkers: A panel study in healthy adults. Ecotoxicol. Environ. Saf. 2021, 228, 113024. [Google Scholar] [CrossRef] [PubMed]
Wang, F.; Liu, C. Acute and chronic health impact of fine particulate matter constituents. Curr. Pollut. Rep. 2024, 10, 401–411. [Google Scholar] [CrossRef]
Yan, J.; Li, Z.; Wang, K.; Xie, C.; Zhu, J.; Wu, S. Association between ambient fine particulate matter constituents and mortality and morbidity of cardiovascular and respiratory diseases: A systematic review and meta-analysis. Environ. Pollut. 2025, 379, 126476. [Google Scholar] [CrossRef] [PubMed]
Mao, Q.; Xiao, R.; Yang, W.; Wang, X.; Kong, Y.-Z. Global burden of pneumoconiosis attributable to occupational particulate matter, gasses, and fumes from 1990–2021 and forecasting the future trends: A population-based study. Front. Public Health 2025, 12, 1494942. [Google Scholar]
Mazurek, J.; Dodd, K.; Syamlal, G.; Blackley, D.D.J.; Weissman, D.N. Coal workers’ pneumoconiosis–associated deaths—United States, 2020–2023. MMWR Morb. Mortal. Wkly. Rep. 2025, 74, 627–633. [Google Scholar] [CrossRef]
Min, L.; Mao, Y.; Lai, H. Burden of silica-attributed pneumoconiosis and tracheal, bronchus & lung cancer for global and countries in the national program for the elimination of silicosis, 1990–2019: A comparative study. BMC Public Health 2024, 24, 571. [Google Scholar] [CrossRef]
Byeon, S.; Lee, B.; Raj Mohan, B. Removal of ammonia and particulate matter using a modified turbulent wet scrubbing system. Sep. Purif. Technol. 2012, 98, 221–229. [Google Scholar] [CrossRef]
Zhao, L.; You, R.; Stocchino, A.; Chen, Q. Bubble deformation characteristics and the effect on particle removal efficiency in wet scrubbers. Sep. Purif. Technol. 2025, 357, 130178. [Google Scholar] [CrossRef]
Zhao, X.; Jia, J.; Li, X.; Wang, L.; Wang, Y.; Hu, H.; Shen, Z.; Jiang, Y. Potential use of wet scrubbers for the removal of tobacco dust particles in the tobacco industry. Atmosphere 2022, 13, 380. [Google Scholar] [CrossRef]
Ali, H.; Plaza, F.; Mann, A. Numerical prediction of dust capture efficiency of a centrifugal wet scrubber. AIChE J. 2018, 64, 1001–1012. [Google Scholar] [CrossRef]
Havryliv, R.; Kostiv, I.; Maystruk, V. Using the computational fluid dynamic to the wet scrubbing process modeling. In Proceedings of the 2021 11th International Conference on Advanced Computer Information Technologies, Deggendorf, Germany, 15–17 September 2021; pp. 204–207. [Google Scholar]
Luan, Y.G.; Zhao, Z.H.; Liu, P.F. Computational fluid dynamic analysis of gas flow characteristics in a new type of scrubber based on mechanical materials. Adv. Mater. Res. 2013, 700, 119–122. [Google Scholar] [CrossRef]
Lu, H.; Lin, C.; Le, T.; Lu, M.-C.; An, Z.; Tsai, T.-S.; Dai, Y.; Huang, Y.-S.; Tsai, C.-J. Novel wet electrostatic precipitator using pulse-air-jet-assisted water flow (PJWF) for online dust cake removal. Sep. Purif. Technol. 2025, 367, 132863. [Google Scholar]
Kim, J.; Kim, J.J.; Lee, S.J. Efficient removal of indoor particulate matter using water microdroplets generated by a MHz-frequency ultrasonic atomizer. Build. Environ. 2020, 175, 106797. [Google Scholar] [CrossRef]
Li, D.; Liang, S.; Zhang, A.; Jing, D. Research and application of mining Venturi tube wet dust collector. Coal Technol. 2020, 39, 165–168. [Google Scholar] [CrossRef]
Li, X.; Wu, X.; Hu, H.; Jiang, S.; Wei, T.; Wang, D. Mesoscale behavior study of collector aggregations in a self-excited wet dust scrubber. J. Air Waste Manag. Assoc. 2017, 68, 73–91. [Google Scholar] [CrossRef] [PubMed]
Wei, T.; Li, X.; Hu, H.; Wang, D.; Xiang, W. Characteristic parameters mining of gas-liquid two-phase flow pattern recognition of wet dust scrubber. J. Mech. Eng. 2017, 53, 143–148. [Google Scholar] [CrossRef]
Fu, D.; Li, X.; Shen, Y.; Liu, Y. Study on the influence of fog film distribution on the performance of wet dust collector. Coal Technol. 2025, 44, 145–149. [Google Scholar]
Hu, S.; Gao, Y.; Feng, G.; Hu, F.; Liu, C.; Li, J. Experimental study of the dust-removal performance of a wet scrubber. Int. J. Coal Sci. Technol. 2021, 8, 228–239. [Google Scholar] [CrossRef]
Chen, Z.; You, C.; Liu, H.; Wang, H. The synergetic particles collection in three different wet flue gas desulfurization towers: A pilot-scale experimental investigation. Fuel Process. Technol. 2018, 179, 344–350. [Google Scholar] [CrossRef]
Wei, H.; Wang, J.; Chen, S. Research on light scattering method and β-ray dust detection technology based on data fusion. Min. Saf. Environ. Prot. 2025, 52, 82–87. [Google Scholar]
Wei, T.; Li, X.; Wang, D. Identification of gas-liquid two-phase flow patterns in dust scrubber based on wavelet energy entropy and recurrence analysis characteristics. Chem. Eng. Sci. 2020, 217, 115504. [Google Scholar] [CrossRef]
Zhao, L.; Liu, J. Fast real-time measurement method of a wet scrubber on particle purification efficiency with image information entropy analysis. Build. Environ. 2022, 218, 109133. [Google Scholar] [CrossRef]
Koza, J. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
Riahi-Madvar, H.; Gholami, M.; Gharabaghi, B.; Seyedian, S.M. A predictive equation for residual strength using a hybrid of subset selection of maximum dissimilarity method with Pareto optimal multi-gene genetic programming. Geosci. Front. 2021, 12, 101222. [Google Scholar] [CrossRef]
Soleimani, S.; Rajaei, S.; Jiao, P.; Sabz, A.; Soheilinia, S. New prediction models for unconfined compressive strength of geopolymer stabilized soil using multi-gen genetic programming. Measurement 2018, 113, 99–107. [Google Scholar] [CrossRef]
Tahmassebi, A.; Gandomi, A.H. Building energy consumption forecast using multi-objective genetic programming. Measurement 2018, 118, 164–171. [Google Scholar] [CrossRef]
Gandomi, A.; Alavi, A. A new multi-gene genetic programming approach to non-linear system modeling. Part II: Geotechnical and earthquake engineering problems. Neural Comput. Appl. 2012, 21, 189–201. [Google Scholar]
Gandomi, A.; Alavi, A. A new multi-gene genetic programming approach to nonlinear system modeling. Part I: Materials and structural engineering problems. Neural Comput. Appl. 2012, 21, 171–187. [Google Scholar] [CrossRef]
Bardhan, A. Probabilistic assessment of heavy-haul railway track using multi-gene genetic programming. Appl. Math. Model. 2024, 125, 687–720. [Google Scholar] [CrossRef]
Ma, B.; Zhu, Z.; Song, X.; Pan, T.; Zhang, C.; Liu, Z. Real-time dynamic prediction of rate of penetration based on BO-MGGP. China Pet. Mach. 2025, 53, 18–27. [Google Scholar]
Wei, T.; Zhu, R.; Zhang, B.; Zhu, L.; Fan, S.; Luo, H.; Li, Z.; Bu, B.; Li, Z.; Li, X.; et al. A gas–liquid entrainment scrubber for fine silica dust control in an ore-crushing workshop: Industrial field testing and integrated assessment. J. Clean. Prod. 2026, 538, 147427. [Google Scholar] [CrossRef]
Sarrazin, F.; Pianosi, F.; Wagener, T. Global sensitivity analysis of environmental models: Convergence and validation. Environ. Model. Softw. 2016, 79, 135–152. [Google Scholar] [CrossRef]
Soto Calvo, M.; Lee, H.; Chisale, S. A novel method for long-term power demand prediction using enhanced data decomposition and neural network with integrated uncertainty analysis: A Cuba case study. Appl. Energy 2024, 372, 123864. [Google Scholar] [CrossRef]
Wang, Z.; Cudmani, R.; Peña Olarte, A.; Zhang, C.; Zhou, P. Leveraging Bayesian methods for addressing multi-uncertainty in data-driven seismic liquefaction assessment. J. Rock Mech. Geotech. Eng. 2025, 17, 2474–2491. [Google Scholar] [CrossRef]
He, S.; Du, W.; Peng, X. Uncertainty-aware optimization of zeolite particle size: Utilizing quantile regression and SHAP analysis. Chem. Eng. J. 2025, 512, 162085. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the experimental system.

Figure 2. Schematic diagram of a three-gene MGGP model structure: (a) Gene 1: η₁ = sin(x₁) + x₂ × x₃; (b) Gene 2: η₂ = ln(x₂) − x₄; (c) Gene 3: η₃ = cos(x₃) × (x₂ − x₄).

Figure 3. The core modeling process of MGGP.

Figure 4. Correlation and distribution matrix plot of the input and output variables.

Figure 5. Scatter plot of predicted versus measured values for PM₁: (a) Training set; (b) Testing set.

Figure 6. Scatter plot of predicted versus measured values for PM_2.5: (a) Training set; (b) Testing set.

Figure 7. Scatter plot of predicted versus measured values for PM₁₀: (a) Training set; (b) Testing set.

Figure 8. Scatter plot of predicted versus measured values for TSP: (a) Training set; (b) Testing set.

Figure 9. Nonlinear fitting result of p versus h and v.

Figure 10. The variation of dust removal efficiency with liquid level height: (a) PM₁; (b) PM_2.5; (c) PM₁₀; (d) TSP.

Figure 11. Variation of dust removal efficiency with airflow velocity: (a) PM₁; (b) PM_2.5; (c) PM₁₀; (d) TSP.

Figure 12. The variation of dust removal efficiency with inlet dust concentration: (a) PM₁; (b) PM_2.5; (c) PM₁₀; (d) TSP.

Figure 13. The importance and interaction indices of the different input variables: (a) Relative importance percentages of input variables; (b) Combined importance and interaction indices between input variables for PM₁; (c) Combined importance and interaction indices between input variables for PM_2.5; (d) Combined importance and interaction indices between input variables for PM₁₀; (e) Combined importance and interaction indices between input variables for TSP.

Figure 14. Two linear QR models for training data and testing data: (a) PM₁; (b) PM_2.5; (c) PM₁₀; (d) TSP.

Figure 15. Uncertainty intervals for the proposed MGGP model for PM₁.

Figure 16. Uncertainty intervals for the proposed MGGP model for PM_2.5.

Figure 17. Uncertainty intervals for the proposed MGGP model for PM₁₀.

Figure 18. Uncertainty intervals for the proposed MGGP model for TSP.

Figure 19. Statistical results of MPI and PICP for the prediction models of dust with different particle size fractions: (a) MPI; (b) PIPC.

Table 1. Performance metrics of the MGGP model under different input variable combinations.

Input Variable	Training Data		Testing Data
Input Variable	R²	RMSE	R²	RMSE
h/v/p/c	0.94738	2.45880	0.91424	2.48690
h/f/p/c	0.92908	2.85450	0.90825	2.57210
h/v/p/c/σ_p	0.92718	2.89250	0.89187	2.79240
h/v/p	0.93916	2.64380	0.88432	2.88820

Table 2. Statistical analysis of the input and output variables.

Variable	Dust Size Fraction	Min		Mean		Std		Max
Variable	Dust Size Fraction	Train	Test	Train	Test	Train	Test	Train	Test
h (mm)	/	−60	−60	5.68	−4.71	32.54	33.13	45	45
v (m/s)	/	0.47	0.47	8.75	8.40	4.44	4.48	15.98	15.98
p (pa)	/	−1767	−1767	−860	−825	498	503	−59	−59
c (μg/m³)	PM₁	86	88	129	127	12	15	147	146
c (μg/m³)	PM_2.5	323	343	551	547	45	57	661	599
c (μg/m³)	PM₁₀	3401	4875	10,950	11,335	2784	3063	20,079	18,597
c (μg/m³)	TSP	5441	7996	24,138	25,651	10,414	11,389	59,787	57,671
η (%)	PM₁	18.03	17.21	75.59	76.52	17.68	16.42	95.32	95.40
η (%)	PM_2.5	50.44	52.78	89.01	89.92	11.20	10.72	98.38	98.35
η (%)	PM₁₀	80.30	80.22	97.54	98.03	5.22	7.57	99.83	99.82
η (%)	TSP	90.51	90.75	98.78	99.04	4.05	7.31	99.90	99.90

Table 3. Parameter settings for the MGGP algorithm.

Parameter	Settings
Population size	800
Number of generations	500
Tournament size	6
Elitism fraction	0.15
Terminate value	0.003
Crossover rate	0.84
Mutation rate	0.14
Direct reproduction	0.02
Function set	+, −, ×, /, power, sum3, prod3, sqrt, square, sin, cos

Table 4. Performance comparison of the MGGP, XGBoost, and MNR models.

Model	Dust Size Fraction	R²		RMSE
Model	Dust Size Fraction	Train	Test	Train	Test
XGBoost	PM₁	0.98051	0.94348	2.43822	3.67202
	PM_2.5	0.98268	0.92608	1.41067	2.30881
	PM₁₀	0.98224	0.96880	0.50849	0.44874
	TSP	0.98638	0.96860	0.21522	0.21695
MGGP	PM₁	0.93049	0.91792	4.60406	4.42530
	PM_2.5	0.94346	0.91836	2.54869	2.42636
	PM₁₀	0.96624	0.94934	0.70103	0.57177
	TSP	0.95167	0.93750	0.40540	0.30606
MNR	PM₁	0.85247	0.80137	6.70736	6.88391
	PM_2.5	0.87564	0.81110	3.77995	3.69071
	PM₁₀	0.86026	0.73722	1.42635	1.30225
	TSP	0.85234	0.71430	0.70861	0.65440

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, L.; Zhu, R.; Zhou, J.; Luo, H.; Li, X.; Wei, T. Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study. Mathematics 2026, 14, 1142. https://doi.org/10.3390/math14071142

AMA Style

Zhu L, Zhu R, Zhou J, Luo H, Li X, Wei T. Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study. Mathematics. 2026; 14(7):1142. https://doi.org/10.3390/math14071142

Chicago/Turabian Style

Zhu, Linling, Ruhua Zhu, Jun Zhou, Huiqing Luo, Xiaochuan Li, and Tao Wei. 2026. "Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study" Mathematics 14, no. 7: 1142. https://doi.org/10.3390/math14071142

APA Style

Zhu, L., Zhu, R., Zhou, J., Luo, H., Li, X., & Wei, T. (2026). Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study. Mathematics, 14(7), 1142. https://doi.org/10.3390/math14071142

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Performance Prediction for Wet Scrubbers Using Multi-Gene Genetic Programming: An Application-Oriented Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental System and Data Acquisition

2.2. Data Preprocessing Methods

2.3. MGGP

3. MGGP Modeling

3.1. Selection of Input Variables

3.1.1. Variable Correlation Analysis Method

3.1.2. Input Variable Selection Result

3.2. Modeling Data

3.3. MGGP Parameter Configuration

3.4. MGGP Modeling Results

3.5. Interpretability of the MGGP Models

4. Discussion of Model Performance

4.1. Single Factor Analysis

4.1.1. Parameter Configuration

4.1.2. Effect of Liquid Level Height on Removal Efficiency

4.1.3. Effect of Airflow Velocity on Removal Efficiency

4.1.4. Effect of Inlet Dust Concentration on Removal Efficiency

4.2. Sensitivity Analysis

4.2.1. The Elementary Effect Test

4.2.2. Quantification of Variable Relative Importance

4.2.3. Sensitivity of Dust Removal Efficiency to Different Variables

4.3. Uncertainty Analysis

4.3.1. Quantile Regression

4.3.2. MPI and PICP

4.3.3. QR Prediction Analysis

4.3.4. Statistical Analysis of MPI and PICP

4.4. Comparative Analysis

5. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI