Permeation Flux Prediction of Vacuum Membrane Distillation Using Hybrid Machine Learning Techniques

Vacuum membrane distillation (VMD) has attracted increasing interest for various applications besides seawater desalination. Experimental testing of membrane technologies such as VMD on a pilot or large scale can be laborious and costly. Machine learning techniques can be a valuable tool for predicting membrane performance on such scales. In this work, a novel hybrid model was developed based on incorporating a spotted hyena optimizer (SHO) with support vector machine (SVR) to predict the flux pressure in VMD. The SVR–SHO hybrid model was validated with experimental data and benchmarked against other machine learning tools such as artificial neural networks (ANNs), classical SVR, and multiple linear regression (MLR). The results show that the SVR–SHO predicted flux pressure with high accuracy with a correlation coefficient (R) of 0.94. However, other models showed a lower prediction accuracy than SVR–SHO with R-values ranging from 0.801 to 0.902. Global sensitivity analysis was applied to interpret the obtained result, revealing that feed temperature was the most influential operating parameter on flux, with a relative importance score of 52.71 compared to 17.69, 17.16, and 14.44 for feed flowrate, vacuum pressure intensity, and feed concentration, respectively.


Introduction
The scarcity of drinking water has been a global problem for years.Desalination of seawater and brackish water has become one of the most promising methods to produce fresh water [1][2][3][4][5].Desalination is a process in which saline water is separated into two parts, one that has a low concentration of dissolved salts, which is called fresh water, and another which has a much higher concentration of dissolved salts than the original feed water, which is referred to as brine or concentrate [6].Traditionally, water desalination was achieved via thermal distillation; however, recent years have witnessed the climax of membrane technologies development and implementation.This has led to the widespread installation of pressure-driven membrane methods such as reverse osmosis (RO).While RO technology has proven its worth as a feasible desalination technique, it produces large quantities of brine that need to be properly treated.Membrane distillation (MD) is the only membrane technology that can handle streams with high salinity as the brine.
MD is a thermally driven method that separates water from a saline aqueous solution using a microporous hydrophobic membrane [7,8].The temperature difference across the membrane creates a water vapour pressure gradient that serves as a driving force for the pure water vapour transfer to the permeate side [9,10].The transferred vapour then condensates onto a cold surface to produce pure water.MD has several advantages over other traditional separation technologies, including a large evaporation area integrated into a membrane module, a lower operating pressure than pressure-driven membrane processes, the capacity to utilize low-grade heat energy, and the ability to treat highly contaminated water [1,4].There are four MD configurations, namely direct contact membrane distillation (DCMD), air gap membrane distillation (AGMD), sweep gas membrane distillation (SGMD), and vacuum membrane distillation (VMD) [10][11][12][13].VMD stands out among the other MD systems due to its low heat loss and reduced temperature polarization effects [14,15].The air in the permeate side is evacuated by applying a continuous vacuum below the equilibrium vapour pressure [16].VMD can be used for a variety of purposes, and not only for seawater desalination, including ethanol recovery, the removal of trace pollutants, and the removal of volatile organic compounds (VOCs) from water [17,18].Higher permeate water flux can be obtained with minor conductive heat loss across the membrane in the VMD setup [19].As a result, VMD has received a lot of attention in the field of water treatment [20].
In the realm of membrane science and technology, developing a mathematical model that can predict membrane separation processes is an efficient process [21,22].The models are useful in the simulation and optimization of membrane systems, resulting in more efficient and cost-effective separation process designs [23].Artificial neural networks (ANNs) are a multivariate regression modelling technique that can deal with linear and non-linear behaviours [24,25].This methodology, which is classified as a "black-box" model, does not require explicit statements of the physical meaning of the system or process under investigation.With a limited set of experimental runs, such models allow researchers to investigate the link between the input variables and the process's targets or outputs [26].Furthermore, with an appropriate design of trials, ANN models can be created simply [27].
ANN modelling has been used in forecasting the performance of various membrane technology processes [28].During dead-end MD, an ANN model is used to simulate permeate flux as a function of mixed liquid suspended, temperature, dissolved oxygen, hydraulic retention time, transmembrane pressure, and operating time [27].The structure-optimized single hidden layer neural network was able to accurately reproduce the dynamic behaviour of permeate flux, according to the authors.For membrane separation, ANN modelling was used.It was found that artificial neural networks may reliably predict real-world process behaviour with relatively low error, <5%.Permeate flux reduction in crossflow membranes for saline water removal was explored using ANN modelling.The ANN model was found to be capable of accurately predicting permeate flux from process factors such as transmembrane pressure, feed solution concentration, and membrane type [29].For the transient crossflow membrane of polydispersed suspensions, the ANN model was used.Neural networks were utilized to model crossflow membranes dynamically.
According to recent review papers, the researchers focused mainly on the application of ANN in seawater desalination areas [30].In other words, other machine learning techniques are still not explored in this field.Although the ANN is considered a robust prediction model, it needs lots of data samples to be effectively trained [31].Conducting lots of experimental tests to obtain the data samples is often laborious, difficult, and expensive.The new trend of research in optimization problems is to apply heuristic algorithms to optimize the parameters of the predicting models and hence obtain highly accurate predictions.This work is dedicated to exploring the feasibility of one of these techniques, namely SVR-SHO, for predicting VMD performance and comparing the outcome with the results of common tools such as ANNs, classical SVR, and multiple linear regression.The work will also apply a novel global sensitivity analysis to interpret the predicted results and select the most influential parameters that have a large impact on the flux.To the author's best knowledge, testing SVR-SHO for predicting VMD behaviour has not been investigated before, and this is the first attempt that has done that.

Experimental Work
The VMD system was designed to collect experimental data for the sake of assessing the system's performance by examining the impacts of operating variables (feed temperature, feed flow rate, feed concentration, and vacuum-side pressure) on permeate flux for the desalination procedure.A stainless-steel tube, 20 cm in length with a 10 mm outer diameter and a 9 mm interior diameter, was used to make the hollow fibre module, which was attached to a Swagelok tube fitting (10 mm) on each end.The hollow fibre model was selected because it has more potential for scalability compared to the commonly used flat sheet membrane in experimental setups.In the stainless-steel tube, commercial hollow fibre polypropylene (PP) [ACCUREL; PP S6/2 Membrane GmbH] membranes were glued with an epoxy resin (Euxi 50KII-hardener).To check for leaks in the membrane module or the system's connections, distilled water was cycled for 10-15 min.A water bath was used to heat the salt solution to the required temperature.A peristaltic pump was used to circulate the salt solution through the membrane module's lumen side.Water vapour was sucked from the shell side of the module into the condenser by a vacuum pump, and the water was collected in the glass trap.The temperature at the module's input and output ports was recorded and stored in a computer.The feed's hydrostatic pressure was regulated manually via a control valve so that it did not exceed the liquid entry pressure (LEP) of the membrane and did not wet the membrane pores.The following equation was used to calculate permeate flux: where J is permeate flux in (kg/m 2 .h),V is volume of fresh water (L), ρ is water density in (kg/L), t is the operational time in (h), and A is the membrane's effective area (m 2 ), which can be calculated applying Equation (2).d i is the internal diameter of fibre (m), and L is the effective length of fibre (m).
After around 20 min, the VMD process reached a steady state, and the process ran for another 5 h.Every 20 min, the measured amount of permeate was added to the feed solution to maintain the same feed levels and concentration in the conical flask.The feed temperature, feed flow rate, feed concentration, and absolute pressure were all varied in fifty tests.

Support Vector Regression
Support vector machines (SVMs) are sophisticated kernel-based machine learning techniques which have been successfully applied to solve a wide range of classification, numerical prediction, density estimation, and pattern recognition problems [32].The technique uses the structural risk minimization principle to obtain an N-dimensional hyperplane with wide margins to classify the data into predefined groups [33].As shown in Figure 1, the data points located in close proximity to the hyperplane are called support vectors, and lying at the centre of the margin is the optimal separating hyperplane that maximizes the separation gap for the training data points.A crucial aspect in SVM is the incorporation of kernel functions (radial basis function, polynomial, Fisher, and Bayesian) which play an important role in transforming non-linear decision surface of a lower dimensional space to a linear equation of a higher dimensional space.The equation of the hyperplane in 'z' dimensions for a given training dataset 'D' is expressed as follows: x j , y j , j = 1, 2, . . ., z y j ∈ +1, −1, can be determined by: where x j represents the input expressed as an n-dimensional real vector, y j is the output whose value is either 1 or −1, and z j is an SVM-generated multiplier.Once the hyperplane is established, the new input q is categorized using Equation (5): where K x j , x represents the kernel function, and B is the bias.The function maps the input dataset to a set of features as shown in Figure 1.

Support Vector Regression
Support vector machines (SVMs) are sophisticated kernel-based machine learning techniques which have been successfully applied to solve a wide range of classification, numerical prediction, density estimation, and pattern recognition problems [32].The technique uses the structural risk minimization principle to obtain an N-dimensional hyperplane with wide margins to classify the data into predefined groups [33].As shown in Figure 1, the data points located in close proximity to the hyperplane are called support vectors, and lying at the centre of the margin is the optimal separating hyperplane that maximizes the separation gap for the training data points.A crucial aspect in SVM is the incorporation of kernel functions (radial basis function, polynomial, Fisher, and Bayesian) which play an important role in transforming non-linear decision surface of a lower dimensional space to a linear equation of a higher dimensional space.The equation of the hyperplane in 'z' dimensions for a given training dataset 'D' is expressed as follows:  = {(x j , y j ), j = 1,2, … , z| y j ∈ {+1, −1, x j ∈ R n } (3) can be determined by: where x j represents the input expressed as an n-dimensional real vector, y j is the output whose value is either 1 or −1, and z j is an SVM-generated multiplier.Once the hyperplane is established, the new input q is categorized using Equation ( 5): where K(x j , x) represents the kernel function, and B is the bias.The function maps the input dataset to a set of features as shown in Figure 1.

Feed-Forward Neural Networks (FFNNs)
Feed-forward neural networks are one of the most basic forms of neural networks where the information propagation is only in the forward direction (no back-loops) [34].The network architecture is composed of many simple processing elements known as neurons, which are generally arranged into a sequence of three fully connected layers, i.e., an input layer, a hidden layer, and an output layer.A schematic diagram of an FFNN is

Feed-Forward Neural Networks (FFNNs)
Feed-forward neural networks are one of the most basic forms of neural networks where the information propagation is only in the forward direction (no back-loops) [34].The network architecture is composed of many simple processing elements known as neurons, which are generally arranged into a sequence of three fully connected layers, i.e., an input layer, a hidden layer, and an output layer.A schematic diagram of an FFNN is shown in Figure 2.Each neuron in a layer is conjoined in a unidirectional network with the help of weighted pathways called interconnections.The strength of these interconnections between the neurons corresponds to the adaptable synaptic weights.These weights multiplied by the incoming signals (inputs) are first summed and then transferred through an activation function to generate the final outcome (Equations ( 6) and ( 7)).
Membranes 2023, 13, 900 where T net is the summation of the weighted inputs, f (T net ) is the non-linear activation function, Y j is the input neuron, w j is the weight coefficient, and β is the bias.
where   is the summation of the weighted inputs, (  ) is the non-linear activation function,   is the input neuron,   is the weight coefficient, and ꞵ is the bias.Further, the simulated observations of the network are compared with the actual observations, and the network error is computed using Equation ( 3) where   represents the error between the predicted and the observed value, and   is the observed value of jth neuron.

Multiple Linear Regression
Multiple linear regression (MLR) is a generalized linear technique which attempts to model the association between the target variable and the predictor variable using linear combinations of the latter.The technique leads to a clearer and more precise understanding of the relationship between each individual predictor with the target and the relationships between the input predictors themselves.MLR is one of the most intuitive forecasting approaches, which presents several advantages such as lower computational cost, simple model structures, and parsimonious input data requirements in comparison with the physical models [35].A general MLR model can be developed using the following equation: Further, the simulated observations of the network are compared with the actual observations, and the network error is computed using Equation ( 3) where E r represents the error between the predicted and the observed value, and O j is the observed value of jth neuron.

Multiple Linear Regression
Multiple linear regression (MLR) is a generalized linear technique which attempts to model the association between the target variable and the predictor variable using linear combinations of the latter.The technique leads to a clearer and more precise understanding of the relationship between each individual predictor with the target and the relationships between the input predictors themselves.MLR is one of the most intuitive forecasting approaches, which presents several advantages such as lower computational cost, simple model structures, and parsimonious input data requirements in comparison with the physical models [35].A general MLR model can be developed using the following equation: where T denotes the dependent variable or the target, Y i (I = 0, 1, 2. ..n) represents the regression coefficients, X i is the independent variable, Y 0 is a constant value (intercept), and E r is the error term.

Spotted Hyena Optimizer
The spotted hyena optimizer (SHO) is a state-of-the-art meta-heuristic optimization technique based on the foraging behaviour of spotted hyenas in nature [36].The technique presents better outcomes for complex nonlinear problems and proves to be an efficient constraint-handling method in comparison with other algorithms [37].The SHO mathematically depicts the hunting behaviour of spotted hyenas, which consists of four key steps, i.e., encircling, hunting, attacking, and searching for prey via a set of empirical equations.The following sections present the set of equations used to describe the hunting behaviour of spotted hyenas.

Encircling
The spotted hyenas encircle the prey (current best solution) and, based on the prey's location, they update their positions to acquire the target.The encircling behaviour in SHO is represented by the following set of equations: where L h represents the distance a spotted hyena needs to cover to reach its prey, t is the ongoing iteration, Y p (t) represents the location of the prey, Y(t) is the spotted hyena location, and C 1 and C 2 are the coefficients computed using Equations ( 12)-( 14): where Iteration = 0, 1, 2,. .., Max iteration .
Here, h is linearly decreased from 5 to 0 over the course of iterations and v 1 ,v 2 are vectors of random numbers that are generated in the range [0, 1].

Hunting
The hunting behaviour of the SHO is modelled using the following equations: where Y h is the first best-spotted hyena's position, Y k represents the other hyena's position, and N denotes the number of spotted hyenas, which may be further computed as: where M is a randomly generated number whose value lies within [0.5, 1], nos is the number of solutions, and N h denotes the batch of N optimal responses.

Attacking
Considering the above-mentioned equations, the attacking behaviour of the SHO can be represented as follows: where Y(t + 1) saves the best solution (hyena position) and helps in the position updation of others.

Searching
The search for prey depends on the location of the spotted hyena group represented by the vector N h .To hunt effectively, these spotted hyenas must spread out and move away from each other.In the case of the SHO algorithm, the vector C 2 is varied randomly with values >1 or <−1, which allows the search factors (hyenas) to move further from the prey.Moreover, these factors tend to move towards the prey when the value of |C 2 | > 1.This process helps the SHO to show more randomization and acquire global optima.

Model Development
In this study, several AI models were established, such as RF, SVR, ANN, and a hybrid model (SVR-SHO) including the incorporation of SVR and the hyena algorithm.Furthermore, a statistical model such as MLR is also used in this study.In this study, a total of 50 experimental samples were utilized for model construction.Out of these, 33 samples (66%) were randomly selected for training the models, while the remaining 17 samples (34%) were reserved for validation purposes.Detailed statistical descriptions for both the training and testing data can be found in Table 1.The RMSE criteria used the optimization function.The hyperparameters of single models such as ANN, RF, and SVR have been selected via the trial-and-error method.However, the hyena algorithm was employed in this study to tune the hyperparameters of the SVR (i.e., kernel parameters, box constraints, and epsilon coefficients) [33,38] that have a significant impact on the model accuracy and performance.The model parameters that reduce the value of RMSE throughout the training phase will be used to create and evaluate the comparable model.MATLAB was used to establish all the models in this study.Figure 3 shows the main processes of developing the models.
In order to assess the suggested models, several statistical fitting indicators were used, such as correlation coefficient (R), coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE%), and the Willmott index of agreement (WI).These indicators are usually used to evaluate the performance of AI-based models [39] and can provide a deep insight into the accuracy of the predicted results [40].The mathematical expressions of these indicators are provided as below [41][42][43][44][45]: Membranes 2023, 13, 900 where X obs i and X pred i are the actual and predicted pressure flux of the ith sample, while X obs and X pred are the average values of actual and predicted pressure flux, and n is the total sample numbers.
Membranes 2023, 13, x FOR PEER REVIEW 8 of 20 In order to assess the suggested models, several statistical fitting indicators were used, such as correlation coefficient (R), coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE%), and the Willmott index of agreement (WI).These indicators are usually used to evaluate the performance of AI-based models [39] and can provide a deep insight into the accuracy of the predicted results [40].The mathematical expressions of these indicators are provided as below [41][42][43][44][45]:

Experimental Results
Permeate flux increases with increasing feed temperature as shown in Figure 4 because the actual driving force for MD is the vapour pressure difference across the membrane, which increases with increasing temperature.It can be noticed that the permeation flux was required for about 20-30 min to reach a steady state, then the flux was approximately constant until 300 min into the VMD process, indicating that there were no membranefouling effects since the module was washed after each run with distilled water.The effects of feed temperature on permeate conductivity and salt rejection for a solution of 35 g/L NaCl is illustrated in Figure 5. Permeate conductivity increased slightly with increasing feed temperature.This observation shows that temperature has a minor effect on the membrane pore-wetting process, and permeate conductivity ranged below 10 µS/cm with salt rejection achieved at about 99.99%.The conductivity of permeate for a concentration of 30 g/L NaCl was between 1.1-8.5 µS/cm at temperatures ranging between 60 and 80 • C. Figure 6 shows the behaviour of inlet and outlet feed temperatures with time for the VMD system during operation at 300 min at different feed temperatures.This figure also shows that the system required approximately 20-30 min to reach a steady state for the inlet and outlet feed temperatures.

Experimental Results
Permeate flux increases with increasing feed temperature as shown in Figure 4 because the actual driving force for MD is the vapour pressure difference across the membrane, which increases with increasing temperature.It can be noticed that the permeation flux was required for about 20-30 min to reach a steady state, then the flux was approximately constant until 300 min into the VMD process, indicating that there were no membrane-fouling effects since the module was washed after each run with distilled water.The effects of feed temperature on permeate conductivity and salt rejection for a solution of 35 g/L NaCl is illustrated in Figure 5. Permeate conductivity increased slightly with increasing feed temperature.This observation shows that temperature has a minor effect on the membrane pore-wetting process, and permeate conductivity ranged below 10 µS/cm with salt rejection achieved at about 99.99%.The conductivity of permeate for a concentration of 30 g/L NaCl was between 1.1-8.5 µS/cm at temperatures ranging between 60 and 80 °C. Figure 6 shows the behaviour of inlet and outlet feed temperatures with time for the VMD system during operation at 300 min at different feed temperatures.This figure also shows that the system required approximately 20-30 min to reach a steady state for the inlet and outlet feed temperatures.The effects of absolute pressure (in a vacuum zone) on permeate flux at different feed temperatures are shown in Figure 7.The absolute pressure varied from 12.7 to 28 kPa (abs) at different feed temperatures (45, 57 and 65 • C) and permeate salt concentrations were maintained constant at 0.6 L/min and 35 g/L, respectively.It was found that permeate flux considerably decreases with the increase at the permeate side.The permeate flux declined by about 15-48 kg/m 2 h when the absolute pressure increased from 12.7 to 28 kPa (abs) at different feed temperatures, which indicates the obvious effect of absolute pressure on MD flux.The influence of air in the membrane pores on the water vapour diffusion through the pores can be neglected in VMD; thus, the conduction heat transfer across the membrane can be neglected, and this leads to an increase in permeate flux.The effects of absolute pressure (in a vacuum zone) on permeate flux at different temperatures are shown in Figure 7.The absolute pressure varied from 12.7 to 28 kPa ( at different feed temperatures (45, 57 and 65 °C) and permeate salt concentrations w maintained constant at 0.6 L/min and 35 g/L, respectively.It was found that permeate considerably decreases with the increase at the permeate side.The permeate flux decl by about 15-48 kg/m 2 h when the absolute pressure increased from 12.7 to 28 kPa (ab different feed temperatures, which indicates the obvious effect of absolute pressur MD flux.The influence of air in the membrane pores on the water vapour diffu through the pores can be neglected in VMD; thus, the conduction heat transfer across membrane can be neglected, and this leads to an increase in permeate flux.The effects of absolute pressure (in a vacuum zone) on permeate flux at different feed temperatures are shown in Figure 7.The absolute pressure varied from 12.7 to 28 kPa (abs) at different feed temperatures (45, 57 and 65 °C) and permeate salt concentrations were maintained constant at 0.6 L/min and 35 g/L, respectively.It was found that permeate flux considerably decreases with the increase at the permeate side.The permeate flux declined by about 15-48 kg/m 2 h when the absolute pressure increased from 12.7 to 28 kPa (abs) at different feed temperatures, which indicates the obvious effect of absolute pressure on MD flux.The influence of air in the membrane pores on the water vapour diffusion through the pores can be neglected in VMD; thus, the conduction heat transfer across the membrane can be neglected, and this leads to an increase in permeate flux.

Artificial Intelligent-Based Models
In this section, the performance of the proposed models was evaluated via both the training and testing phases using various statistical matrices and graphical representations.The performance of MLR, ANN, SVR, and SVR − SHO during the training phase is summarized in Table

Artificial Intelligent-Based Models
In this section, the performance of the proposed models was evaluated via both the training and testing phases using various statistical matrices and graphical representations.The performance of MLR, ANN, SVR, and SVR − SHO during the training phase is summarized in Table  , and  ≈ 0.319) and higher prediction accuracy ( ≈ 0.894, and  ≈ 0.942).Overall, the SVR −  The results of the training phase reflect the learning ability of the proposed models.To this end, it is essential to test the performance of the proposed models during the testing phase.The primary motivation behind assessing the proposed models during the testing phase is that the proposed models receive only the input parameters in the testing phase, whereas in the training phase, models receive both input parameters and corresponding values.It also showed the potency of the proposed models as a predictive tool [1].In this regard, Table 3 shows the performance of the proposed models during the testing phase.According to Table 3, the SVR − SHO model still provides robust prediction by producing fewer prediction errors (MAE ≈ 3.278 Kg m 2 .h ,RMSE ≈ 3.931 Kg m 2 .h ,and MAPE ≈ 11.5%) and producing the highest prediction accuracy (R ≈ 0.971, and W I ≈ 0.983), which is consistent with the findings from the training phase.On the other hand, the vanilla SVR model showed a significantly poor performance, producing a high error rate MAE ≈ 10.087 Kg m 2 .h ,RMSE ≈ 11.292 Kg m 2 .hand MAPE ≈ 34.9%) and lower prediction accuracy (R ≈ 0.801 and W I ≈ 0.709), which indicates that even though the error measurement in the training phase is acceptable, it does not imply that the model will perform better in the testing phase.The results also showed that despite the poor performance of the ANN model during the training phase, nevertheless, regarding the testing phase, the ANN model showed a significant improvement since the prediction error was reduced by 16.906, 43.386, and 63.475% for MAE, RMSE, and MAPE, respectively, and the prediction accuracy increased by 28.187 and 10.975% for R and WI, respectively.Moreover, the ANN model outperformed both SVR and MLR models during the testing phase.
Several graphical presentations have been created for better performance assessment of the proposed models during the testing phase.Figure 8a presents a boxplot showing the prediction error distribution for the proposed models.The boxplot includes a box drawn between the first quartile (Q25%) and the third quartile (Q75%), which is also known as the interquartile range (IQR).The horizontal line in the box represents the median of the data, also known as the second quartile (Q50%).The extended lines from the first and third quartiles are called the whiskers, which indicate the variability outside these quartiles, while the outliers are drawn as individual points.According to Figure 8a,b, it can be seen that the SVR − SHO model provides fewer IQR values (IQR = 5.388) which indicates that the SVR-SHO model generates minor prediction errors.On the other hand, the distribution of prediction errors in the vanilla SVR model is significantly high (IQR = 18.91), which indicates the poor performance of the SVR in predicting FP.
Figure 9 shows the scatter plot that examines the proposed models' prediction efficiency via the testing phase.It can be seen from Figures 9 and 10        To obtain further insight into the capacity of the proposed models to predict the FP value, the relative error diagram was constructed as shown in Figure 10 is used based on the testing phase.
According to Figure 11, the SVR − SHO model provides 80% of the dataset, with RE% ranging between (−20 to 20%) .Meanwhile, three observation indicates an RE% of more than 20%.A possible explanation for this result might be that no related information was incorporated into the training data to replicate those three observations in their actual magnitude.Overall, the SVR − SHO model provides good results in terms of RE%.Conversely, the SVR model provides the poorest performance in terms of RE%.To obtain further insight into the capacity of the proposed models to predict the FP value, the relative error diagram was constructed as shown in Figure 10 is used based on the testing phase.
According to Figure 11, the SVR − SHO model provides 80% of the dataset, with RE% ranging between (−20 to 20%).Meanwhile, three observation indicates an RE% of more than 20%.A possible explanation for this result might be that no related information was incorporated into the training data to replicate those three observations in their actual magnitude.Overall, the SVR − SHO model provides good results in terms of RE%.Conversely, the SVR model provides the poorest performance in terms of RE%.
The similarities between predicted and actual FP values were graphically presented in a Taylor diagram (Figure 12) for further assessment.The Taylor diagram can highlight the efficiency of the proposed models, where a series of points (models) can be visualized on a polar plot based on the correlation coefficient and the standard deviation.Furthermore, it calculates the ratio of variance in order to obtain the relative depth of the actual and predicted variations.Evidently, Figure 12 shows that the SVR − SHO model is closer to the observed value than the other models.Notably, the hyperparameters of all applied models are provided in Table 4.
the testing phase.
According to Figure 11, the SVR − SHO model provides 80% of the dataset, with RE% ranging between (−20 to 20%) .Meanwhile, three observation indicates an RE% of more than 20%.A possible explanation for this result might be that no related information was incorporated into the training data to replicate those three observations in their actual magnitude.Overall, the SVR − SHO model provides good results in terms of RE%.Conversely, the SVR model provides the poorest performance in terms of RE%.The similarities between predicted and actual FP values were graphically presented in a Taylor diagram (Figure 12) for further assessment.The Taylor diagram can highlight the efficiency of the proposed models, where a series of points (models) can be visualized on a polar plot based on the correlation coefficient and the standard deviation.Furthermore, it calculates the ratio of variance in order to obtain the relative depth of the actual and predicted variations.Evidently, Figure 12 shows that the SVR − SHO model is closer to the observed value than the other models.Notably, the hyperparameters of all applied models are provided in Table 4  The results clearly demonstrated that the hybrid model exhibited outstanding performance and yielded more precise results compared to traditional models such as ANN, MLR, and classical SVR.The enhancement in predictive capability can be attributed to the synergistic combination of two algorithms, namely SVR and SHO.SHO proved to be particularly effective in optimizing the hyperparameters of SVR, enabling the model to make robust predictions of permeate flux even in scenarios where the available data samples were limited.This effectiveness is likely due to SHO's ability to efficiently solve optimiza-  The results clearly demonstrated that the hybrid model exhibited outstanding performance and yielded more precise results compared to traditional models such as ANN, MLR, and classical SVR.The enhancement in predictive capability can be attributed to the synergistic combination of two algorithms, namely SVR and SHO.SHO proved to be particularly effective in optimizing the hyperparameters of SVR, enabling the model to make robust predictions of permeate flux even in scenarios where the available data samples were limited.This effectiveness is likely due to SHO's ability to efficiently solve optimization problems by striking a balance between exploration and exploitation, thus discovering global solutions [46].Additionally, SHO is widely recognized for its simplicity, adaptability, and robustness, consistently delivering competitive results when compared to other metaheuristics.By thoroughly exploring and exploiting the search space, SHO is able to identify optimal solutions and find global optima even in complex problem environments, resulting in high-quality solutions that approach global optimality.

Impact of Operating Parameters on Flux
Considering the high accuracy of the predictions obtained via the SVR − SHO model, we only focused on this model to interpret the FP data.For the SVR − SHO model, global sensitivity analysis (GSA) is adopted to obtain the relative importance of each input parameter.GSA is a method that is based on the Monte Carlo simulation approach used to prioritize the importance of parameters for a given modelling exercise.The technique is a variance-based approach which proportionally quantifies the uncertainty of each input parameter to the uncertainty of the model output.The fundamental principle behind the GSA is to fix all the input parameters to a known value, apart from the one parameter that needs to be computed, and then use the available formula to determine the output weight that corresponds to each of the input parameters [47].In a practical sense, GSA can be applied to any supervised machine learning algorithm for regression tasks.Figure 13 shows the relative importance of four input components obtained via SVR − SHO combined with GSA.The obtained results showed that the feed temperature is the most impactful parameter regarding flux pressure, with a relative importance of 52.71%, followed by feed permeate flux and vacuum pressure with a relative importance of 17.69% and 17.16%, respectively.On the other hand, feed concentration has a minor influential effect on the results, with a relative importance of 12.44%.
shows the relative importance of four input components obtained via SVR − SHO combined with GSA.The obtained results showed that the feed temperature is the most impactful parameter regarding flux pressure, with a relative importance of 52.71%, followed by feed permeate flux and vacuum pressure with a relative importance of 17.69% and 17.16 %, respectively.On the other hand, feed concentration has a minor influential effect on the results, with a relative importance of 12.44%.

Conclusions
A new hybrid model called SVR-SHO has been created to predict the flux pressure of VMD.This model combines the SHO algorithm and SVR and has been developed and

Conclusions
A new hybrid model called SVR-SHO has been created to predict the flux pressure of VMD.This model combines the SHO algorithm and SVR and has been developed and tested using experimental data.The performance of SVR-SHO was compared to other commonly used machine learning models such as ANN, classical SVR, and a statistical model called MLR.The prediction results showed that the SVR-SHO model outperformed the other models, achieving high accuracy with an R of 0.94.Additionally, a global sensitivity analysis was conducted to interpret the predictive results, which revealed that the feed temperature was the most influential parameter affecting the flux pressure.
The results achieved with the SVR-SHO model and global sensitivity analysis underscore their potential as a valuable tool for more effectively studying membrane performance.By leveraging software-driven data and advanced modelling techniques such as SVR-SHO and global sensitivity analysis, valuable insights can be gained into the complex behaviour of membrane systems.Also, such approaches enable more informed decisions and optimized experimental setup, leading to improved efficiency and performance.Thus, it would be beneficial to further test the SVR-SHO model in more complex systems that involve hybrid membrane technologies and changing membrane properties with fouling accumulation emanating from different contaminants.

Figure 1 .
Figure 1.SVM features: (a) optimal hyperplane and the support vectors, and (b) mapping of data using SVM.

Figure 1 .
Figure 1.SVM features: (a) optimal hyperplane and the support vectors, and (b) mapping of data using SVM.

Figure 3 .
Figure 3.A flowchart showing the primary steps for models' development.

Figure 3 .
Figure 3.A flowchart showing the primary steps for models' development.

Figure 4 .
Figure 4. Flux variation with time at different feed temperatures, 0.6 L/min feed flow rate, 12.7 kPa (abs) absolute pressure, 35 g/L feed solution.

Figure 4 .
Figure 4. Flux variation with time at different feed temperatures, 0.6 L/min feed flow rate, 12.7 kPa (abs) absolute pressure, 35 g/L feed solution.

Figure 5 .
Figure 5.Effect of feed temperature on permeate conductivity and salt rejection for solution g/L, 0.6 L/min feed permeate flux, and 12.7 kPa (abs) absolute pressure.

Figure 6 .
Figure 6.Inlet and retentate temperature behaviour of the VMD system over 300 min operatio

Figure 5 . 20 Figure 5 .
Figure 5.Effect of feed temperature on permeate conductivity and salt rejection for solution at 35 g/L, 0.6 L/min feed permeate flux, and 12.7 kPa (abs) absolute pressure.

Figure 6 .
Figure 6.Inlet and retentate temperature behaviour of the VMD system over 300 min operation.

Figure 6 .
Figure 6.Inlet and retentate temperature behaviour of the VMD system over 300 min operation.
≈ 5.866 − 34.496% in terms of R and ≈ 3.111 − 19.155% in terms of W I compared to the other models, which indicates the high capability of the model to capture the behaviour of FP.Membranes 2023, 13, x FOR PEER REVIEW 11 of 20

Figure 7 .
Figure 7. Effect of absolute pressure on permeate flux for salt solution at 35 g/L, 0.6 L/min feed permeate flux, and different feed temperatures.

Figure 7 .
Figure 7. Effect of absolute pressure on permeate flux for salt solution at 35 g/L, 0.6 L/min feed permeate flux, and different feed temperatures.
that the SVR − SHO model exhibits an excellent correlation between the actual and predicted FP values providing an R 2 value of 0.942.Meanwhile, the vanilla SVR model produces a significantly poor correlation between the actual and predicted FP values, with R 2 = 0.642.According to Figures9 and 10, the hybrid models demonstrated the highest R 2 values during both the training and testing phases, surpassing the performance of other models.and b, it can be seen that the SVR − SHO model provides fewer IQR values (IQR = 5.388) which indicates that the SVR-SHO model generates minor prediction errors.On the other hand, the distribution of prediction errors in the vanilla SVR model is significantly high (IQR = 18.91), which indicates the poor performance of the SVR in predicting FP.

Figure 8 .
Figure 8. Boxplot of the prediction errors by the proposed models during the testing phase.(a) Boxplot diagram, (b) Quantile range of predicting error for each model (Q75, Q50, Q75, IQR).

Figure 9
Figure 9 shows the scatter plot that examines the proposed models' prediction efficiency via the testing phase.It can be seen from Figures 9 and 10 that the SVR − SHO model exhibits an excellent correlation between the actual and predicted FP values providing an R 2 value of 0.942.Meanwhile, the vanilla SVR model produces a significantly poor correlation between the actual and predicted FP values, with R 2 = 0.642.According to Figures 9 and 10, the hybrid models demonstrated the highest R 2 values during both the training and testing phases, surpassing the performance of other models.

Figure 8 .
Figure 8. Boxplot of the prediction errors by the proposed models during the testing phase.(a) Boxplot diagram, (b) Quantile range of predicting error for each model (Q75, Q50, Q75, IQR).

20 Figure 9 .
Figure 9. Scatter plot of actual and predicted FP values obtained via the proposed models and experimental data: training phase.

Figure 9 .
Figure 9. Scatter plot of actual and predicted FP values obtained via the proposed models and experimental data: training phase.

Figure 9 . 20 Figure 10 .
Figure 9. Scatter plot of actual and predicted FP values obtained via the proposed models and experimental data: training phase.

Figure 10 .
Figure 10.Scatter plot of actual and predicted FP values obtained via the proposed models and experimental data: testing phase.

Figure 11 .
Figure 11.The relative error distribution diagrams of the proposed models during the testing phase.

Figure 11 .
Figure 11.The relative error distribution diagrams of the proposed models during the testing phase.

Figure 12 .
Figure 12.Graphical visualization of Taylor diagram for the proposed models.

Figure 12 .
Figure 12.Graphical visualization of Taylor diagram for the proposed models.

Figure 13 .
Figure 13.The relative importance of four input components using the SVR − SHO model.

Figure 13 .
Figure 13.The relative importance of four input components using the SVR − SHO model.

Table 1 .
The statistical description of the used data.
. It can be observed from Table 2 that the SVR optimized with SHO meta-heuristic algorithm (SVR − SHO) provides the best performance in flux pressure (FP) prediction by producing fewer prediction errors (MAE ≈ 2.262 MAPE ≈ 0.149) and providing the highest prediction accuracy (R ≈ 0.946, and WI ≈ 0.971).Moreover, the results showed that the ANN model performs significantly poorly, producing a high rate of prediction errors (MAE ≈ 8.238 MAPE ≈ 0.684) and lower prediction accuracy (R ≈ 0.704, and W I ≈ 0.834).The results showed the superiority of the SHO meta-heuristic algorithm in decreasing prediction error (76.562% for MAE, 47.386% for RMSE, and 71.673 % for MAPE) and increasing the prediction accuracy (9.364% for R, 19.141 for WI) for the vanilla SVR model.On the other hand, the MLR model outperformed the vanilla SVR model in FP prediction by producing fewer prediction errors (MAE ≈ 60554 Kg m 2 .h ,RMSE ≈ 18.607 Kg m 2 .h, and MAPE ≈ 0.319) and higher prediction accuracy (R ≈ 0.894, and W I ≈ 0.942).Overall, the SVR − SHO model reduced the prediction errors in terms of MAE ≈ 65.487 − 76.561%, RMSE ≈ 26.454 − 60.373%, and MAPE ≈ 53.321 − 78.232%.Meanwhile, it improved prediction accuracy by . It can be observed from Table2that the SVR optimized with SHO meta-heuristic algorithm (SVR − SHO) provides the best performance in flux pressure (FP) prediction by producing fewer prediction errors ( ≈ 2.262 and providing the highest prediction accuracy ( ≈ 0.946, and WI ≈ 0.971).Moreover, the results showed that the ANN model performs significantly poorly, producing a high rate of prediction errors ( ≈ 8.238 and lower prediction accuracy ( ≈ 0.704, and  ≈ 0.834).The results showed the superiority of the SHO meta-heuristic algorithm in decreasing prediction error (76.562% for , 47.386% for , and 71.673 % for ) and increasing the prediction accuracy (9.364% for R, 19.141 for WI) for the vanilla SVR model.On the other hand, the MLR model outperformed the vanilla SVR model in FP prediction by producing fewer prediction errors (  ≈ 60554

Table 2 .
The performance of the proposed models during the training phase.
Table 2 also shows the superiority of the SHO algorithm in the SVR-SHO hybrid model by reducing the error of the vanilla SVR model by ≈ 67.504, 65.189, and 66.925% in terms of MAE, RMSE, and MAPE, respectively.Furthermore, the hybrid model enhanced the vanilla model's prediction accuracy by 21.153 and 38.672% in terms of R and WI, respectively.

Table 3 .
The performance of the proposed models during the testing phase.