1. Introduction
Liquid wastes from municipal and desalination activities in coastal areas are often discharged into the receiving water body in the form of wastewater jets [
1,
2,
3,
4,
5,
6]. Such jets are referred to as buoyant jets if their densities S.are lower than that of the ambient water. Wastewater discharges can significantly jeopardize the marine environment and ecology, so it is important to understand better the dilution and mixing properties of buoyant jets for an efficient design of the outfall systems and accurate evaluation of the environmental and ecological impacts.
After discharge into the receiving water body, an effluent jet spreads and mixes with the ambient water because of the jet advective and diffusive transport processes and the shear stresses at the interface between the jet and ambient water. Different from a turbulent plume, which is driven entirely by buoyancy effects, a buoyant jet is driven by both momentum and buoyancy effects, and thus the well-established turbulent plume theory [
7,
8,
9] cannot be utilized for wastewater jets. Depending on the dynamic interactions with neighboring boundaries, a buoyant jet can be classified as either a free or confined buoyant jet. When a jet is far from any boundary, the jet can freely spread in and mix with the ambient water. Thus, the rapid jet spreading and ambient water entrainment can effectively dilute the wastewater effluent. However, jets are often subjected to the effects of boundaries in some practical applications [
10,
11,
12], such as the jets issued from outfall systems that are in dredged trenches or those with protective riser tubes (
Figure 1). In such applications, the confinement could suppress the jet spread and restrict the intrusion of the ambient water into the jet, and thus affects the jet dilution. Such effects make the dilution and mixing processes more complicated and require further investigation.
Confined jets have been extensively studied in the past several decades due to both academic and practical interests. Experimental works on confined jets have involved the measurements of flow and mixing properties of a gas jet injected from an orifice into the gas flow in a rectangular-sectioned duct [
13], PIV (particle image velocimetry) measurements of the turbulence distribution in the bulk free region of a vertically injected confined jet [
14], LIF (laser-induced fluorescence) measurements of concentration distribution of a vertical buoyant jet subjected to lateral confinement [
10], PIV measurements of the velocity field of a horizontal jet subjected to vertical confinement [
15], and PIV and planar LIF measurements of the flow and mixing parameters of multi-lateral jets discharged into a round pipe flow [
16]. A limited number of previous modeling works also involved the development of theoretical or numerical models for confined jets. Jirka [
17] presented an integral model that can model the jet mixing properties with both the lateral and vertical confinement being considered. El-Amin et al. [
18] conducted numerical modeling of the flow and temperature fields of a two-dimensional buoyant confined jet injected vertically into a cylindrical tank. More recently, Yan and Mohammadian [
19] simulated a laterally confined vertical buoyant jet as a three-dimensional phenomenon by solving the full Navier-Stokes equations. These experimental and numerical studies have provided comprehensive and reliable data sets for a better understanding of confined jets and promising tools for estimating the flow and mixing properties of confined jets. However, such approaches are typically too expensive or time-consuming for practical applications, and thus it is very beneficial to propose a new tool that is more efficient.
Over recent years, artificial intelligence (AI) techniques have been widely introduced to solve water and marine engineering problems. For example, Kisi et al. [
20] presented the earliest application of the adaptive neuro fuzzy inference system (ANFIS) to estimate sediment in rivers; Hipni et al. [
21] employed the support vector machine (SVM) and ANFIS to forecast the daily dam water levels, Rezaei et al. [
22,
23] proposed the fuzzy Multi-Objective Particle Swarm Optimization (f-MOPSO) algorithm for conjunctive water use management, and Bashiri et al. [
24] used the harmony search algorithm and artificial neural networks (ANN) to predict local scour depth downstream of sluice gates. Recently, Moroni et al. [
25] reviewed the literature regarding an environmental decision support system for oil spill management, and stated that the provision of support services is generally based on AI paradigms. Compared with some other AI techniques, such as ANN and ANFIS, genetic programming (GP) algorithm has the advantage of being able to automatically evolve an explicit mathematical model. One of the most recent variants of GP is the multigene genetic-programming (MGGP) technique [
26,
27] An MGGP model is a combination of multiple genes, and each gene is a traditional GP gene. It has become very popular in the past several years. Garg et al. [
28] developed some MGGP-based soil water retention curve models for three different soils. These models describe the water content as functions of net stress and suction, and the results showed that the predictions made by these MGGP-based models matched the measurements very well. Kaydani et al. [
29] applied several AI techniques, including ANN, ANFIS, traditional GP, and MGGP, to predict the permeability in a heterogeneous oil reservoir. Their study demonstrated that the MGGP technique was more advantageous than the other AI techniques in permeability estimation in terms of providing a relatively compact model and avoiding structural dependency. More recently, Safari and Mehr [
27] developed a Pareto-optimal model using MGGP to model particle Froude numbers in large sewers. Their research outcome has demonstrated that the proposed MGGP model can provide more accurate predictions than the existing conventional regression models. These studies encouraged the application of AI techniques to water-related problems.
Despite the generalization and predictive capabilities of AI techniques, very few applications of AI techniques have been employed in predicting the dilution properties of wastewater effluents. To the best of the authors’ knowledge, an MGGP-based model for initial dilution of laterally confined vertical buoyant jets has not been reported in the literature. Therefore, the present paper develops MGGP-based models that can predict the jet concentration as functions of the jet densimetric Froude number, Fr, the confinement index, β, and the location of the cross section, Z/D. Three of the Pareto-optimal models are compared with experimental data for both the training and testing periods, and the best model is determined with both the performance and simplicity considered. The best MGGP-based model is also compared with a single-gene genetic-programming (SGGP)-based model and an existing regression-based empirical model, and the results demonstrated the superiority of the MGGP-based model.
3. Results and Discussion
3.1. Pareto-Optimal MGGP Models
Figure 3a illustrates the performance-complexity trade-off of the evolved MGGP models for dimensionless jet centerline concentration. Determining the best model needs to consider two conflicting objectives simultaneously: accuracy and simplicity. Therefore, the Pareto optimization method was utilized. The green circles in
Figure 3a denote the non-dominated solutions (i.e., there exists no solution that is more accurate and less complex at the same time) and the corresponding evolved MGGP models are referred to as the Pareto-optimal models [
27]. These models are regarded as optimal because a solution cannot improve the model performance without increasing the complexity, and vice versa.
There are eight Pareto-optimal models that have been identified and three of them are selected because they have a good balance of the performance-complexity trade-off during the training period. The general form of the mathematical expressions for the evolved model is given by:
where
α1-
α5 denote the weighting coefficients of the genes, and
γ is the bias term.
The genes (including their weighting coefficients) and the bias for each model are summarized in
Table 1. It can be seen in the table that Model A has five genes, so it is the most complex among all three models. Model B has four genes, so it is slightly simpler than Model A. Model C only has two genes, so it is the simplest model of the three models. These models were created through a process of reproduction, crossover, and mutation, as explained in
Section 2.3. Since many terms in these equations could not be found by using a traditional regression technique, the MGGP technique has the advantage of being able to detect the hidden relationships between variables.
Figure 3b presents the fitness in each generation. It can be observed that the model performance was relatively poor at the beginning of the evolving process. However, the prediction errors quickly decreased with the increasing number of generations. The errors converged to acceptable magnitudes after about 20 generations. After approximately 30 generations, the improvement of model performance with the number of generations became insignificant, indicating that a better outcome could not be expected by performing the MGGP for more generations, and thus the current number of generations was sufficient.
3.2. General Observations
For general observations of the jet dilution processes and the performances of the MGGP models, the measured and modeled concentration profiles for selected cases are presented in
Figure 4. The concentration generally decreased with increasing
Z/D because vertical buoyant jets are diluted along the trajectory due to the jet spreading and ambient water entrainment. The concentration was generally higher (i.e., the dilution was lower) when the Fr number was high because the magnitude of the buoyancy force becomes less significant compared with the inertial force at higher Fr numbers. The concentration generally increased with the increase of
β because the intrusion of ambient water was restricted and consequently the dilution was reduced. All three MGGP models captured these observations correctly.
For each case, the markers corresponding to the results of Model A are very close to the experimental ones, indicating a good generalization capacity of Model A. The data points of Model B are close to those of Model A. Most of the sub trees in Model B are identical to the counterparts in Model A, implying that these sub trees were the most influential elements, and the remaining sub trees only affected the predictions to a small degree. The Model C points deviate farther from the experimental points, indicating that Model C was over-simplified and thus was not able to predict the jet concentration satisfactorily. It should be noticed with caution that a few MGGP predictions of the dimensionless jet centerline concentration exceeded one, which is not physically reasonable. This reveals the deficiency of data-driven models in strictly ensuring physical rules. The outcome can be improved by training the models with more data with extreme values, such as 0 and 1, or by post-processing the results according to the following bounding criteria:
For the purpose of assessing the original MGGP models, the post-processed predictions are not further discussed herein.
3.3. Quantitative Assessment of Model Performances
For a detailed quantitative evaluation of the performances of the evolved models, the actual and MGGP results at all the data points are plotted in
Figure 5. The root-mean-squared error (RMSE) and coefficient of determination (R
2) values are also shown in the same plots. The training data points were used to evolve and select the MGGP models, and the testing data points were employed to serve as unseen data points for assessing the predictive capacity of the evolved models. For the training period, the results shown in
Figure 5 are consistent with the observations in
Figure 3a; namely, Model A (RMSE = 0.037, R
2 = 0.968) had the best whereas Model C (RMSE = 0.094, R
2 = 0. 798) had the worst performance among the models on the Pareto front in fitting the training data sets for dimensionless jet centerline concentration.
For the testing data points, the performances of Model A (RMSE = 0.039, R2 = 0.956) and Model B (RMSE = 0.039, R2 = 0.957) were almost identical. Because of the lower complexity, Model B could be more favorable than Model A if model simplicity is of great importance. The fitting indices of Models A and B for the testing data sets were only slightly different from those for the training data sets, demonstrating that the risks of over-fitting were well controlled. Model C is much simpler than Models A and B, as it only has two genes, but its performance in predicting the dimensionless jet centerline concentration was much poorer at the same time (RMSE = 0.082, R2 = 0.808). This confirms that Model C was under-trained and thus showed consistently poor performance in both the training and testing periods.
To evaluate further the evolved MGGP models, scatter plots of the actual and MGGP predicted results are presented in
Figure 6.
Figure 6a shows an excellent match between the measurements and Model A predictions. Only a few symbols appear farther from the identity line. These symbols correspond to the extreme values, i.e., when the output variable is close to 0 or 1, which has been discussed earlier. The symbols for Model B are located near those for Model A with several points deviating slightly farther from the line of agreement than Model A for the training data points. The plots and the R
2 values indicated that both Models A and B had less than 5% error in predicting the data sets.
Figure 6c shows that the symbols for Model C are distributed in a wider range and the model had about 20% error in the predictions, revealing the poor performance of Model C.
In summary, the detailed comparisons presented above demonstrated that both Model A and Model B can provide very accurate predictions for dimensionless centerline concentrations of vertical buoyant jets subjected to lateral confinement whereas Model C performed poorly. If model complexity is of major concern, Model B is suggested. However, because the evolved models can be easily executed within the environment of MATLAB and Model A has a better generalization capacity for the entire data sets, Model A is focused on hereafter in this paper.
3.4. Comparison with the SGGP Model
The Pareto front plot for the SGGP modeling is presented in
Figure 7a. These models were also created through the evolutionary process described earlier in this paper. The only difference between the SGGP and MGGP algorithms was that only one gene was utilized in an SGGP chromosome. It clearly shows that the fitness of each population in SGGP modeling was poorer than that in MGGP modeling. There are only three solutions that fall on the Pareto front and the best one (in a red circle) is selected for further analysis. The mathematical expression for this model is:
Compared with the MGGP models, this SGGP model is much simpler because it only has one gene.
Figure 7b reports the convergence characteristics of the SGGP solutions. Compared with the MGGP modeling, the errors in the SGGP algorithm decreased at a slower rate. The error curve became smoother after about 40 generations, so more SGGP generations were not necessary because they would not noticeably improve the outcome. The mean fitness of the SGGP modeling in
Figure 7b is lower than that of the MGGP modeling, indicating that the overall performance of the SGGP algorithm was poorer than that of MGGP.
Figure 8 compares the measured and modeled dimensionless jet centerline concentration at each data point.
The RMSE and R2 values are also reported. For the training data sets, the predictions of the best SGGP model (RMSE = 0.068, R2 = 0.895) were obviously less accurate than the MGGP Models A and B but more accurate than the MGGP Model C. In terms of the testing data sets, the RMSE and R2 values in the SGGP predictions were 0.063 and 0.888, respectively, which also demonstrated that the performance of the SGGP model was better than the MGGP Model C but worse than the MGGP Models A and B.
In summary, the above findings revealed the superior performance of MGGP in evolving an accurate model over the SGGP algorithm. However, the lower RMSE and higher R2 values compared with the MGGP Model C indicate that an SGGP-based model could have higher prediction capability than an MGGP model that is over simplified.
3.5. Comparison with the Existing Empirical Equation
Lee and Lee [
10] proposed a regression-based empirical equation for the dimensionless jet centerline concentration, which can be expressed as:
This equation has been used by Lee and Lee [
10] and Yan and Mohammadian [
19] to estimate and plot the concentration profiles of a laterally confined jet, and the good agreement with experimental results demonstrated the generalization capability of this equation. A scatter plot of the dimensionless jet centerline concentration at each data point calculated using the empirical equation is shown in
Figure 9. The MGGP and SGGP data points are also depicted in the same figure. Very surprisingly, the RMSE and R
2 values of the empirical results (RMSE = 0.068, R
2 = 0.895) were almost identical to those of the SGGP (RMSE = 0.067, R
2 = 0.894) results for the entire data sets. A closer examination of the evolved SGGP model and the empirical formulation shows that the two models have a very similar model structure. These observations demonstrated that the SGGP algorithm can easily evolve an explicit model that is as accurate as a regression-based empirical model which normally requires extensive analyses and efforts. The mean absolute errors (MAEs) were also calculated, and the values corresponding to the MGGP, the SGGP, and the empirical models were 0.027, 0.044, and 0.046, respectively. The lower MAE and RMSE and higher R
2 values of the MGGP predictions also demonstrated the generalization capacity of the MGGP model.
3.6. Prediction Confidence Analysis
Figure 10 presents the prediction confidence analysis result for the best MGGP model (Model A). The analysis was performed using MATLAB’s nonlinear regression prediction confidence interval function, “nlpredci”. This function is based on the symmetric confidence interval approach and can calculate the 95% confidence interval half-width at each data point [
34]. The data points shown in
Figure 10 have been sorted based on the MGGP prediction from largest to smallest for display purposes. This figure clearly shows that the MGGP predictions followed the overall data trend very well, with the prediction points being uniformly distributed above or below the curve for the experimental data. The confidence range was relatively narrow, with the mean 95% confidence interval half-width being 0.093.
4. Conclusions
In this work, an alternative approach based on MGGP to predicting the initial dilution of vertical buoyant jet subjected to lateral confinement was proposed. Pareto-optimal MGGP-based models were developed to estimate the dimensionless jet centerline concentration using the dimensionless parameters Fr and β. The best MGGP model (Model A) performed consistently well in modeling both the experimental training (RMSE = 0.037, R2 = 0.968) and testing data sets (RMSE = 0.039, R2 = 0.956). Another candidate and less complex MGGP model (Model B) was also found to be accurate in predicting the experimental data (training: RMSE = 0.041, R2 = 0.962; testing: RMSE = 0.039, R2 = 0.957) and may be preferable when model simplicity is of major importance. The best MGGP model had lower errors and higher correlations in fitting the entire data sets (MAE = 0.027, RMSE = 0.038, R2 = 0.966) than the best SGGP model (MAE = 0.044, RMSE = 0.067, R2 = 0.894) and the existing empirical model (MAE = 0.046, RMSE = 0.068, R2 = 0.895). The results of nonlinear regression prediction confidence interval analysis revealed that the mean 95% confidence interval half-width of Model A was 0.093. These results and observations are encouraging. Therefore, the MGGP technique will be applied to other effluent mixing problems in further work.