1. Introduction
Understanding the mechanisms and patterns of pollutant dispersion is crucial for chemical processes, protecting ecosystems, safeguarding public health, and optimizing environmental management [
1]. The accurate prediction of jet concentration fields helps to better understand the dispersion process of pollutants, supporting the simulation of chemical processes of pollutants, and further optimizing pollution control strategies. However, jet concentration fields involve complex flow characteristics, especially in inclined plane jets, where the velocity distribution of the fluid, shear effects, and boundary effects result in highly nonlinear changes in the concentration field. The handling of high-dimensional data increases the difficulty of modeling and the complexity of predictions. Traditional jet simulation methods, such as physical model experiments, empirical formulas, or semi-empirical models, are suitable for simplified flow models but are limited in their applicability and accuracy in more complex, multi-physics coupled flow phenomena [
2]. Numerical simulations can solve the accuracy issues of traditional methods, but they demand high computational resources, especially when dealing with complex geometries and large-scale flow problems, where the computational load can be immense and may require long computation times.
With the rapid development of computer technology, machine learning methods, as an emerging technology, have gradually demonstrated their potential in modeling complex systems. Machine learning methods, which learn patterns automatically from historical data, can effectively capture complex nonlinear relationships, reducing the reliance on physical process modeling. These methods may offer new insights into jet concentration field prediction in complex environments.
To validate the performance of machine learning methods in simulating jet concentration distributions in complex environments, this study uses machine learning techniques to predict the concentration field of inclined plane jets in a linear stratified environment. Inclined plane jets are widely used in environmental pollution control, water quality monitoring, and marine engineering, and predicting their concentration distribution is crucial for pollution control and water resource management. Especially in stratified environments, different layers of fluid have varying physical properties, and accurately describing interlayer interactions is critical for accurate flow simulation. This case also holds significant meaning for the research of machine learning methods. In such complex environments, jet concentration fields are influenced by multiple factors, and the flow process is highly nonlinear, which helps verify the effectiveness of machine learning in flow field simulation and further assess its potential in environmental flow problems.
Traditional methods for jet concentration field prediction mainly rely on physical models, dimensional analysis, and experimental studies. For instance, El-Ghorab et al. [
3] constructed a physical model using geometric similarity to investigate the impact of emissions, temperature, and flow state on mixing zone characteristics; Jirka et al. [
4] defined the geometry and mixing characteristics of buoyant jets through dimensional analysis and proposed standards for deep-water and shallow-water jets. Furthermore, Knudsen et al. [
5] derived dilution and trajectory formulas for buoyant axisymmetric jets, while Kaminski et al. [
6] proposed a new theoretical model based on local Richardson numbers to study entrainment behavior in turbulent jets. In experimental studies, Bashitialshaaer et al. [
7] conducted experimental research on the behavior of inclined negatively buoyant jets, analyzing the effects of parameters such as nozzle diameter, jet angle, density, and flow rate on the jet trajectory, while Roberts et al. [
8] analyzed the behavior of non-buoyant jets discharged horizontally in stratified fluids; Wallace et al. [
9] and Lee et al. [
10] studied the expansion and mixing behavior of buoyant jets in stratified fluids through experiments. These methods provide theoretical and experimental support for predicting jet concentration fields, but their applicability and accuracy are limited in complex flows and multilayer fluids.
To overcome the accuracy limitations of traditional methods, computational fluid dynamics (CFD) has gradually become a mainstream tool. Jirka et al. [
11] proposed an overall model for buoyant jets considering turbulence and resistance mechanisms, validating its accuracy under different flow conditions. Amamou et al. [
12] simulated the process of turbulent circular jets entering reverse flow using Reynolds stress models, with results consistent with experiments. Meng et al. [
13] explored the impact of density Froude numbers and jet-to-flow velocity ratios on jet behavior through numerical simulations, while Gao et al. [
14] used mixture and renormalization group models to simulate jet behavior and studied the effects of density Froude numbers and jet-to-flow velocity ratios on jet behavior. Ishigaki et al. [
15] proposed an improved solver to reduce the impact of mesh non-orthogonality on buoyant jet numerical solutions. Additionally, Belcaid et al. [
16] validated numerical simulation results for buoyant wall jet turbulence in uniform saline water; Xu et al. [
17,
18,
19] and Chen et al. [
20] investigated the effect of waves on jets through large eddy simulation and FLOW-3D. These studies show that numerical simulations can provide more accurate and comprehensive predictions than traditional methods when dealing with complex fluid dynamics and multi-physics coupling problems. However, in a linear stratified environment numerical simulations still require layered grids and high-precision interpolation methods to handle complex boundary effects and concentration diffusion processes, resulting in a high computational cost.
Due to the high computational cost of numerical simulations, researchers have begun exploring more efficient methods, and machine learning, with its data-driven characteristics, has become a potential solution. Currently, machine learning techniques are beginning to be applied to study jet characteristics. Yan et al. [
21] proposed a method based on multi-gene genetic programming (MGGP) to predict the trajectory of a rosette momentum jet group in flowing currents. The model outperforms the single-gene genetic programming (SGGP) model. The studies indicate that machine learning methods provide new ideas for studying jet characteristics. Compared to numerical simulations, machine learning can simplify the model-building process, demonstrating a higher adaptability and predictive ability when facing complex physical processes. However, related research is still in its infancy, especially in predicting jet concentration fields under complex flow conditions, with limited application of machine learning methods. Among common machine learning models, such as SVR, linear regression, genetic programming, and AdaBoost, the generalization abilities and computational efficiencies vary, and their applicability to this physical problem is not yet clear. The challenge at present is how to improve computational efficiency while ensuring prediction accuracy.
Random forests, by integrating multiple decision trees, effectively reduce the risk of overfitting, improving both computational efficiency and the accuracy and stability of the model. Compared to the reference machine learning models (SVR, linear regression, genetic programming, and AdaBoost), random forest does not require complex kernel function selection or parameter tuning, and it is less susceptible to noise. It may be better suited to handle the potential nonlinear relationships in the jet concentration field. Hassanzadeh et al. [
22] predicted the laminar length and expansion angle of positively buoyant mixing jets using various machine learning methods, and the study showed that the random forest algorithm outperformed other models, providing more accurate predictions of jet expansion behavior under different conditions. However, there has been no study using random forest models to predict jet concentration fields in a linear stratified environment with inclined plane jets.
Therefore, this study explores the application of random forest models under complex flow conditions. Specifically, this study uses OpenFOAM to simulate jet concentration fields under different operating conditions, constructs a comprehensive dataset, and applies the random forest model to predict the concentration distribution. By comparing the prediction performance of random forest with other machine learning models, the superiority of random forests in nonlinear, high-dimensional flow problems is verified. To the best of the author’s knowledge, this is the first study to apply the random forest algorithm to predict the concentration field of inclined plane jets in a linear stratified environment. The method effectively captures nonlinear relationships and high-dimensional features in complex flows, demonstrating a robust performance and accuracy. This research provides a new approach for understanding and predicting concentration changes in linear stratified environments and offers valuable references for further optimizing fluid dynamics models.
4. Discussion
This study is the first to introduce the random forest algorithm to the prediction of the concentration field of inclined plane jets in a linear stratified environment, filling the gap in the application of machine learning in complex stratified flow modeling. Unlike previous studies that mainly focus on traditional fluid mechanics methods or simple analytical models, this research leverages the ensemble feature of multiple decision trees in the random forest model, which can effectively and efficiently reproduce the phenomena. Based on the consideration of fluid physical processes, the model accurately simulates the concentration distribution at different layers in stratified water bodies. The model demonstrates a strong environmental adaptability and generalization ability, providing fast and accurate predictions under varying conditions. This study thoroughly validates the robustness of the random forest model in predicting the concentration field of inclined planar jets in a linear stratified environment, using both comprehensive datasets (with a total of 435,200 data points) and unseen datasets (with a total of 108,800 data points). The results show that the model can construct an accurate and efficient predictive model even under limited data training conditions. Preliminary tests indicate that, for this physical problem, the random forest algorithm outperforms support vector regression, linear regression, genetic programming, and Adaptive Boosting algorithms.
To further evaluate the performance of each machine learning model, the R
2 and RMSE values for each case were calculated, as shown in
Figure 11 and
Figure 12. The error lines in the figures represent the 90% confidence intervals. The results indicate that random forest performed the best across all cases, with its R
2 value being the closest to 1, its RMSE value being the lowest, and its error lines being shorter, suggesting excellent fitting capability and error control, with stable results. Genetic programming and linear regression demonstrated some fitting capability in cases C01–C06, but as the velocity increased their R
2 values significantly decreased and RMSE values rose, indicating limitations in handling complex nonlinear problems. The performance of AdaBoost was similar to that of genetic programming and linear regression, but with slightly less stability. In contrast, support vector regression had lower R
2 values, higher RMSE values, and larger error ranges, showing a poor fitting capability and error control. In terms of computational efficiency and resource consumption, the models were trained on a computer equipped with a 13th generation Intel(R) Core(TM) i5 processor and 16 GB of memory. The running times of random forest, linear regression, genetic programming, and Adaptive Boosting algorithms are all below 90 s, with linear regression having the fastest training speed. However, its predictive performance is limited by the model assumptions. On the other hand, SVR takes the longest time (7 min 32 s) due to the high complexity of kernel function calculations. Therefore, random forest strikes a good balance between accuracy and efficiency. In summary, the random forest algorithm exhibited a superior stability and high performance in predicting complex jet concentration fields compared to other machine learning algorithms. This can be attributed to the advantages of ensemble learning in random forest, where multiple decision trees are constructed and their predictions are combined, effectively reducing overfitting risks and handling complex nonlinear relationships in the data. Compared to other algorithms, random forest automatically adapts to the interactions between data features and excels in handling nonlinear data.
In predicting the concentration field of complex jet processes, the machine learning method proposed in this study significantly improves computational efficiency and prediction accuracy compared to traditional methods. Traditional jet simulation methods include empirical formulas and numerical simulations. Empirical formulas or semi-empirical models are based on specific experimental conditions or simplified assumptions, making their applicability limited. For complex flow phenomena, particularly the complex diffusion process of inclined plane jets in a stratified environment, these methods cannot provide sufficiently accurate results. While numerical simulation methods can describe fluid dynamics processes more precisely, they often rely on complex physical equations. These equations are difficult to solve when dealing with complex systems and require detailed modeling of various aspects of the system. Furthermore, when faced with high-dimensional and complex data, numerical simulations are often constrained by computational resources and model assumptions, resulting in a low computational efficiency. In predicting the concentration field of inclined plane jets in a linear stratified environment, the computation time is several hours. In contrast, machine learning methods learn patterns automatically from historical data in a data-driven manner, without the need to explicitly model physical processes. This reduces the complexity and dependency on assumptions during the modeling process. In this study, the computation time is reduced to the minute level. The machine learning model currently being developed is not intended to surpass the prediction accuracy of numerical simulations, but rather to serve as an efficient alternative. While maintaining a prediction accuracy comparable to numerical simulations, the computation time is reduced from the order of hours to the order of minutes, achieving a two-order-of-magnitude improvement in efficiency and significantly enhancing computational efficiency. Compared to traditional simulation methods, especially in simulating the concentration field of inclined jets in complex environments, machine learning can handle multiple input variables, such as flow velocity, temperature, pressure, etc., thus avoiding the tedious repetition of calculations for each specific scenario required by traditional methods. Once trained, machine learning models can make rapid predictions, significantly improving computational efficiency and reducing resource consumption, with a strong generalization ability. Therefore, machine learning methods are more flexible, efficient, and adaptable than traditional methods when simulating the concentration field of complex jet processes.
In practical applications, taking the example of predicting pollutant diffusion concentration fields for an inclined jet at a coastal sewage outlet, the specific implementation steps are as follows: First, obtain the coordinates (x, y) and their corresponding velocity components (u, v) at each point in the region. These data are organized into a four-column Excel table (the first two columns are u and v, and the last two columns are x- and y-coordinates). The data are then normalized, and the normalized dataset is input into the pre-trained random forest model, which outputs the normalized concentration values. These are then inverse-normalized to obtain the actual concentration field distribution. Finally, the concentration field can be visualized using heatmaps or contour plots, providing an intuitive display of the concentration distribution in the region, which serves as a decision-making basis for optimizing the sewage discharge plan. The entire process from data input to result output takes only a few seconds, improving the efficiency by two orders of magnitude compared to traditional CFD simulations, and does not require complex grid partitioning and iterative calculations, highlighting the unique advantages of machine learning models in practical engineering applications.
This study tested the performance of random forest, support vector regression, linear regression, genetic programming, and AdaBoost algorithms in simulating the concentration field of inclined planar jets in a linear stratified environment. However, this study did not account for complex scenarios involving multi-scale and multi-physical process coupling. In practical applications, physical phenomena in water bodies may be more complex, involving turbulence, sedimentation, thermodynamics, and other factors. Future research can enhance the modeling of these complex physical processes and incorporate them into machine learning models to further improve prediction accuracy and applicability.