NewApproach to Predict the Motion Characteristics of Single Bubbles in Still Water

Featured Application: The outcome of this study may beneﬁcial to the researchers, engineers, designers while dealing with the motion of single bubbles in still water. Abstract: Under the action of gravity, buoyancy, and surface tension, bubbles generated by wave breaking will rupture and polymerize, causing the occurrence of high-speed jets and strong turbulence in nearby water bodies, which in turn a ﬀ ects sea–air exchange, sediment transport, and pollutant movement. These interactions are closely related to the shape and velocity changes in single bubbles. Therefore, understanding the motion characteristics of single bubbles is essential. In this research, a large number of experiments were carried out to serve this purpose. The experimental data were used to develop three machine learning models for the bubble ﬁnal velocity, bubble drag coe ﬃ cient, and bubble shape, respectively. The performance of the feed forward back propagation neural network (FBNN) models for the ﬁnal velocity and drag coe ﬃ cient were evaluated. The coe ﬃ cient of determination (R 2 ) and root mean squared error (RMSE) value of ﬁnal velocity prediction model was recorded at 0.83 and 0.0518, respectively. Meanwhile, for the drag coe ﬃ cient prediction model, the values are 0.92 for R 2 and 0.1534 for RMSE. The models can provide a more accurate output if compared to that from the empirical formulas. K-nearest neighbours (KNN), logistic regression, and random forest were applied as the algorithm while developing the bubble shape classiﬁcation model. The best performance is achieved by the logistic regression.


Background
Gas-liquid two-phase flow widely exists in nature. It plays an important role in engineering applications, especially while dealing with the problems related to sea-air exchange, cavitation of buildings due to the high-speed dam water, natural gas transportation, etc. Under the effect of gravitational force, buoyancy force, and surface tension, the bubbles may experience a change in shape, breakage, as well as merging. These scenarios will lead to the occurrences of high-speed jets and strong turbulence in the water bodies nearby, which affect the mechanism of sea-air exchange and pollutant movement. These interactions are closely related to the states and nature of single bubbles. Therefore, it is essential to have an understanding of the motion characteristics of single bubbles in order to study its effect on sea-air flux exchange, stress exerted on the building structures, and provide a fundamental theoretical basis for the development of marine engineering.
In most of the natural phenomena, as well as real-life practical applications, bubbles in water bodies play a significant role. Based on the previous analyses, the phenomena such as underwater sound wave propagation, atmospheric-marine gas exchange, ship navigation, laser transmission underwater, nutrient supply for marine life, etc., had shown a close relationship with the motion characteristics of single bubbles in the water [1].
Bubbles can be formed through different processes such as wave rolling, ocean biological activities, ship navigation, etc. The existing scientific evidence had proved that the gases entrained by breaking wave contained 30% of human-generated CO 2 , which may help to alleviate the global warming issue. In addition, the splash and entrainment due to the breaking wave can enhance the exchange of heat, water vapour, gas, and energy in the sea.
On the other hand, the acoustic characteristic of the bubbles on the ocean surface layer is one of the most eye-catching issues. A large number of bubbles resulting from the breaking wave will cause a great impact on the optical properties of seawater. It will scatter the light and affect the remote sensing information of the sea surface [2]. Hence, studying the motion characteristics of the single bubbles has an important theoretical and scientific significance for the mechanisms of two-phase flow.

Problem Statement
In the past, to study the kinematic and dynamic mechanisms of single bubbles, the researchers used the theoretical analyses as well as a simplified model. Among all, Lord Rayleigh [3] was the first who assumed the liquid is incompressible, non-viscous, and non-rotating. With the assumption, he proposed a bubble transport equation. Such an equation has become the basis for the study of bubbles. It was widely applied and improved by the other researchers [4][5][6][7][8][9].
In general, the relationship among the motion characteristics of single bubbles and their relationships with the external forces are complicated. It is hard to describe the mechanisms of single bubbles by using only the theoretical analyses. Hence, laboratory work is essential, especially in observing and analysing the process of bubble rise. The widely used experimental approaches to observe the upward movement of single bubbles in still water are photographing method, particle image velocimetry (PIV) measurement, conductance proves, pressure sensing, and others [10][11][12][13][14][15][16][17][18].
However, due to certain limitations of the experimental instruments, the knowledge of single bubbles is still limited. This is because it is difficult to measure the single bubble motion and the flow pattern of the surrounding liquid without any interference.
Therefore, there is a need to have an alternative approach which is able to serve as a basis for the further exploration of single bubble motion and transport mechanisms under the breaking wave condition.

Motivation
Machine learning has a wide application. Feed forward back propagation neural network is one of the popular machine learning approaches [19]. It has the ability to overcome the constraints of the existing mathematical models and helps in problem-solving based on the complementary nature of both modelling and practices.
In addition, machine learning has played an important role in water-related engineering applications, such as reservoir operation [20,21], water management [22,23], turbine operation [24,25], water distribution network [26,27], sediment settling velocity prediction [28], intelligence hydrological model [29,30], rheological prediction model [31,32] etc. Therefore, it is believed that such a method is also applicable for the study of motion characteristics of the single bubbles.
By formulating machine learning models that are able to predict the final velocity, drag coefficient, and shape of single bubbles with an acceptable level of accuracy, it may beneficial to the researchers, engineers, and hydrologists while dealing with the issues related to bubbles.

Research Objectives
The major aim of this research is, through a machine learning approach, to propose machine learning models which can provide a basis for the single bubble motion and transport mechanism under breaking wave conditions. The proposed models were developed based on the experimental data obtained from laboratory works. During the training process, Reynold (Re) number, Eotvos (Eo) number, and Weber (We) number, all of which are the critical elements influencing motion characteristics of single bubbles, were kept as the input parameters. This study is innovative in its use of the machine learning approach to predict the final velocity, drag coefficient, and shape of single bubbles in still water.

Materials and Methods
Digital image measurement has been widely used in the field of two-phase flow due to its unique characteristics. It has the ability to visualise the motion of single bubbles without showing any disturbance to the flow field. In this study, the high-speed camera was used to capture the whole processes of bubbles production, motion, and deformation in still water.

Laboratory Experiment
First of all, a special tank was designed using rectangular transparent plexiglass. The top was connected to the atmosphere to create a normal pressure state, as shown in Figure 1. There was also a bubble generating device and a drain setting at the bottom of the tank.

117
A peristaltic pump was used to control gas production. It is a must ensure a suitable flow rate Tap water with a water level of 250 mm was poured into the tank. The experiments were started only after the water surface calmed and the water temperature reached the same level as room temperature.
A peristaltic pump was used to control gas production. It is a must ensure a suitable flow rate for the generation of the bubbles. This is because, at a low flow rate condition, the formation of the bubbles is very slow, and thereby it is difficult to detach at the orifice and causes the liquid to flow into the orifice easily. Meanwhile, if the flow rate is too large, the frequency of bubbles generation will be very fast, and it is hard to form single bubbles under such conditions. The equivalent diameter of the bubbles ranged from2 to 7.5 mm, while the periodic injection mode was selected. In order to ensure the periodicity, the detachment frequency, which represented the number of bubbles per unit time, was kept constant throughout the experiments.
On the other hand, in order to clearly capture the whole evolution process of single bubbles, a high-speed two-dimensional camera was set on the side of the bubble's generation section. A three-colour soft light with the standard colour temperature and light brightness was used to smoothen the image capturing process.
The experiments were carried out in a dark room under different flow rate, breakaway frequency, and needle size (to control the size of bubble) conditions. The physical properties of the tested samples are shown in Table 1.

Data Extraction
The motion of single bubbles is closely related to several factors, such as frequency of separation, physical properties of the liquid, size of the bubble, and shape of the bubble. The dynamic and kinematic characteristics of single bubbles in the still water were studied by changing the bubble detachment frequency and the orifice diameter in the specially designed tank.
Digital image processing was used to extract the data such as bubble velocity, drag force coefficient, and bubble shape from the images captured during the experiments. The shape categorisation of the bubbles is basically based on the aspect ratio, circularity of the bubble and visualization of the bubble. The aspect ratio is the ratio of the major axis to the minor axis. If such a ratio ranges from 0.9 to 1, the bubble is considered as a spherical shape. Overall, the smaller the aspect ratio, the flatter the bubbles. On the other hand, circularity is a parameter used to describe the complexity of an object, it is a ratio of the square of perimeter to the area. Combining aspect ratio and circularity, together with bubble visualization, the bubbles were divided into three groups in this study. Figure 2 illustrates the three main categories of the bubble shape, which are spherical-cap shape, elliptical shape, and irregular shape.

129
The experiments were carried out in a dark room under different flow rate, breakaway 130 frequency, and needle size (to control the size of bubble) conditions. The physical properties of the 131 tested samples are shown in Table 1.

134
The motion of single bubbles is closely related to several factors, such as frequency of

138
Digital image processing was used to extract the data such as bubble velocity, drag force 139 coefficient, and bubble shape from the images captured during the experiments. The shape 140 categorisation of the bubbles is basically based on the aspect ratio, circularity of the bubble and 141 visualization of the bubble. The aspect ratio is the ratio of the major axis to the minor axis. If such a 142 ratio ranges from 0.9 to 1, the bubble is considered as a spherical shape. Overall, the smaller the 143 aspect ratio, the flatter the bubbles. On the other hand, circularity is a parameter used to describe the 144 complexity of an object, it is a ratio of the square of perimeter to the area. Combining aspect ratio and 145 circularity, together with bubble visualization, the bubbles were divided into three groups in this 146 study. Figure 2 illustrates the three main categories of the bubble shape, which are spherical-cap 147 shape, elliptical shape, and irregular shape.

Machine Learning Approaches
In general, machine learning can be divided into two major categories, which are supervised and unsupervised learning. The supervised learning method is more widely used, especially in solving the problem related to classification and regression.

Regression Problem-Solving
Feed forward back propagation neural network (FBNN) has great applicability to deal with nonlinear regression problems. For this technique, two processes are involved, which are forward propagation of input data from the input layer through the hidden layer to the output layer and back propagation of error in each layer and constant adjustment of the weight and bias to obtain the optimised outcomes. The mathematical expression of the general concept for feed forward back propagation neural network is described as follows: Assume Y n i refers to the output of the i-th neuron in the n-th layer of the neural network, then where x is the input, Y is the output, w is the weight, and b is the bias. It is an important step to activate the neurons using the transfer function, where the commonly used transfer function is tansig function, as shown in Equation (2): In this study, FBNN was selected to develop models for the final velocity and drag coefficient prediction of single bubbles, where Re, Eo, and We numbers were chosen as the input features.
Re number is an important dimensionless quantity in fluid mechanics used to predict the flow patterns in different fluid flow situations. It can be defined as the ratio of inertial force to viscous force. The formula of Re number is given as: where Re is Reynolds number, ρ l is density of liquid, V T is final velocity of bubble, d is diameter of bubble, and µ l is viscous force of bubble. In this case, such a number was fixed in the range of 100 to 3200. Eo number is another key element in fluid mechanics, particularly while measuring the importance of gravitational forces compared to surface tension and characterizing the single bubble motion. The Eo number can be obtained via: where Eo is Eotvos number, g is gravitational acceleration, ρ l is density of liquid, ρ g is density of gas phase, and σ is surface tension. In this study, Eo number ranged from 0.5 to 18. Meanwhile, We number also plays an important role in fluid mechanics, especially for studying the single bubble motion. It is a measure of the relative importance of the fluid inertia compared to its surface tension. The formula for We number calculation is: where We is Weber number, ρ l is density of liquid, V T is final velocity of bubble, d is diameter of bubble, and σ is surface tension. Throughout the experiments, such a value was set within a range of 0 to 16. The general architecture of the prediction models was shown in Figures 3 and 4.
where We is Weber number, l ρ is density of liquid, T V is final velocity of bubble, d is diameter of 188 bubble, and σ is surface tension. Throughout the experiments, such a value was set within a range 189 of 0 to 16.

190
The general architecture of the prediction models was shown in Figures

203
Entering the dataset X as follows: 205 where x is the variable and y is the label.

206
Euclidean distance, as shown in Equation 7, is the generally taken distance: Meanwhile, the classification law is as follows:

203
Entering the dataset X as follows: 205 where x is the variable and y is the label.

206
Euclidean distance, as shown in Equation 7, is the generally taken distance: Meanwhile, the classification law is as follows: 209 Figure 4. Architecture for the drag coefficient prediction model.

Classification Problem-Solving
K-nearest neighbours (KNN), random forest, and logistic regression are the common algorithms for classification problem. KNN algorithm achieves the final classification purpose by measuring the distances between the eigenvalues and calculating the difference between the distances. The principle of the KNN algorithm is to find the k distance points closest to the training set and then select the k-nearest neighbours category as its own type. Basically, k is an integer not larger than 20 [33,34]. The basic parameters of the KNN algorithm include distance metrics, k-values, and decision rules of classification.
Entering the dataset X as follows: where x is the variable and y is the label. Euclidean distance, as shown in Equation (7), is the generally taken distance: Meanwhile, the classification law is as follows: where i = 1, 2, . . . , N, j = 1, 2, . . . , K; for the indication function I, when y i = c i , I = 1, otherwise I equal to 0.
On the other hand, the random forest algorithm selects each feature as a classification criterion. The error of the classification depends on the degree of differentiation of each independent feature and the connection among them.
By combining the multiple weak classifiers, the final classification result can be obtained through the voting or averaging method. The weak classifier usually chooses the classification and regression tree (CART) decision tree which uses the Gini index to determine the most appropriate branching feature. The definition is shown in Equation (9): where p k is the probability of the occurrence of the k-category, the smaller the Gini, the smaller the uncertainty, and the better the data classification. The CART decision tree avoids the over-fitting through pruning, and the loss function (as shown in Equation (10)): where C(T) is the error of the trunk T in the training data and a is a trade-off parameter. Logistic regression algorithm uses the idea of selection probability to classify the sample. In this study, the softmax regression algorithm was chosen to classify the bubble shape. Softmax regression algorithm converts the features into probabilities. The function relationship is as follows: Meanwhile, the cost function is where the specific form of I(x) is Softmax regression algorithm uses the gradient descent method to solve the model parameters. The calculated gradient result is as follows: All of the above-mentioned algorithms were used to classify the bubble shapes under different dimensionless parameters (Re number, Eo number, and We number). Figure 5 shows the overall flow chart for the machine learning implementation in this study.

Prediction Model for the Final Velocity of Single Bubbles
There is a nonlinear yet complex relationship between the final velocity of single bubbles and the dimensionless parameters (Re number, Eo number, and We number). Therefore, a feed forward back propagation neural network (FBNN) was developed to predict the final velocity.
1000 experiment datasets were collected and separated according to the ratio of 70:15:15. 700 datasets were used to train the model, 150 datasets were used for calibration and validation, and the remaining150 datasets were reserved for testing purposes.
While designing the architecture of the model, tansig activation function was selected as the transfer function, while gradient descent was chosen as the optimization method. As shown in Figure 6, the input layer consisted of three nodes (Re number, Eo number, and We number) while the output layer had only one node (final velocity). There were three hidden layers where each of the layers consisted of 10 hidden neurons. In addition, the learning rate was set to 0.01, the number of training was fixed at 1000, and the momentum factor was 0.9. With such a combination of the hyper parameters, the model exhibited good convergence performance during the training process.         In order to further verify the appropriateness of the FBNN model, root mean squared error (RMSE) was chosen. A smaller RMSE value indicates a higher level of model accuracy. The equation of RMSE is shown as in Equation (15): where n is the number of samples, z(x) is the predicted value, and y(x) is the actual value. The calculated RMSE is 0.0741 for the validation set and 0.0518 for the testing set. Such a low RMSE value indicates that the developed model has achieved a relatively good performance. It has the ability to reflect the relationship between the dimensionless number (Re number, Eo number, and We number) and bubble final velocity with a considerably high level of accuracy. On the other hand, Figure 9 depicted the relationship between R 2 and RMSE in both training and testing sets under varying sample size. With the increases of the sample size, the R 2 value shows a decreasing trend in training set, but an increasing trend in testing set. Meanwhile, from the perspective of RMSE, such a value decreases gradually in training set but increases gently in testing set when the number of sample increases. Only with the sample size of around 900, the RMSE value for both training and testing set shows an almost similar value, meaning that the model can provide a considerably high level of accuracy with that particular sample size. In other words, it is appropriate to choose a slightly higher sample size of 1000 in this study.
where n is the number of samples, z(x) is the predicted value, and y(x) is the actual value. The 273 calculated RMSE is 0.0741 for the validation set and 0.0518 for the testing set. Such a low RMSE value 274 indicates that the developed model has achieved a relatively good performance. It has the ability to 275 reflect the relationship between the dimensionless number (Re number, Eo number, and We 276 number) and bubble final velocity with a considerably high level of accuracy.

277
On the other hand, Figure 9

292
The FBNN model was built with one input layer, three hidden layers, and one output layer, as 293 shown in Figure 10. There were three nodes in the input layer, which are Re number, Eo number, 294 and We number, and only one node in the output, which was bubble drag coefficient. In each hidden 295 layer, there were 10 hidden neurons, and tansig function was selected as the transfer function. By 296 setting the learning rate to 0.001, the number of training to 1000, and the momentum factor to 0.85, 297 the model exhibited good performance during the training process. Figure 9. Relationship between R 2 and root mean squared error (RMSE) in both training and testing sets with respect to different number of samples.

Prediction Model for the Drag Coefficient of Single Bubbles
For bubble drag coefficient prediction model development, the data obtained from experiments were distributed into three portions. Among the datasets, 692 datasets were used for training, 148 datasets were used for validation, while another 148 datasets were kept for testing purpose.
The FBNN model was built with one input layer, three hidden layers, and one output layer, as shown in Figure 10. There were three nodes in the input layer, which are Re number, Eo number, and We number, and only one node in the output, which was bubble drag coefficient. In each hidden layer, there were 10 hidden neurons, and tansig function was selected as the transfer function. By setting the learning rate to 0.001, the number of training to 1000, and the momentum factor to 0.85, the model exhibited good performance during the training process.

300
From Figure 11 and Figure 12, the R 2 -value for both the validation and testing plot is obviously 301 high, reaching 0.96 and 0.92, respectively. The value is approximate to 1, which indicates that the 302 developed model has achieved a good performance in terms of the coefficient of determination. From Figures 11 and 12, the R 2 -value for both the validation and testing plot is obviously high, reaching 0.96 and 0.92, respectively. The value is approximate to 1, which indicates that the developed model has achieved a good performance in terms of the coefficient of determination.

300
From Figure 11 and Figure 12, the R 2 -value for both the validation and testing plot is obviously 301 high, reaching 0.96 and 0.92, respectively. The value is approximate to 1, which indicates that the   From Figure 11 and Figure 12, the R 2 -value for both the validation and testing plot is obviously 301 high, reaching 0.96 and 0.92, respectively. The value is approximate to 1, which indicates that the 302 developed model has achieved a good performance in terms of the coefficient of determination.    In addition, in order to have further verification on the suitability of the model, the RMSE value is calculated. It records at 0.0946 and 0.1534, respectively, for validation set and testing set. Such a low RMSE value indicates that the model is well-performed from the perspective of RMSE.
In order to further show that the developed AI model has the ability to serve as an alternative or replace the need for mathematical calculation, 150 datasets were randomly picked. The predicted values obtained from the model were compared with values calculated through the empirical formulas.
As shown in Figure 13, the drag coefficient (C D ) values calculated through the formula as proposed by Tomiyama et al. [35] show a larger deviation from the actual C D values. Meanwhile, the predicted C D values from the developed AI model and calculated C D values from the formula introduced by Dijkhuizen et al. [36] are in good agreement with the actual values.
From the perspective of the coefficient of determination, the FBNN model exhibits R 2 value of 0.9593, while the formulas proposed by Tomiyama [35] and Dijkhuizen [36] show R 2 values of 0.9192 and 0.9362, respectively. Also, the RMSE value is calculated for each case, recorded at 0.1291 for the developed FBNN model, 0.2412 for Tomiyama's formula [35], and 0.1686 for Dijkhuzen's formula [36].
Based on the above statistical analyses, the predicted value from the developed FBNN model has the highest accuracy if compared with the empirical formulas suggested by the previous researchers.   From the perspective of the coefficient of determination, the FBNN model exhibits R 2 value of 323 0.9593, while the formulas proposed by Tomiyama [35] and Dijkhuizen [36] show R 2 values of 0.9192 324 and 0.9362, respectively. Also, the RMSE value is calculated for each case, recorded at 0.1291 for the 325 developed FBNN model, 0.2412 for Tomiyama's formula [35],and 0.1686 for Dijkhuzen's formula 326 [36].

331
The number of datasets for learning and testing purposes is 710 and 304, respectively. A 332 platform was built under a Python environment to predict the shape of single bubbles. Three 333 algorithms were applied to perform the task which are KNN, random forest, and logistic regression.

334
The input variables are Re number, Eo number, and We number, while the bubble shapes are 335 labelled with number 1, 2, and 3 to represent the bubbles with elliptical shape, spherical-cap shape, 336 and irregular shape, respectively [37]. Table 2 shows the distribution of the confusion matrix corresponding to the KNN model. Most

Prediction Model for the Shape of Single Bubbles
The number of datasets for learning and testing purposes is 710 and 304, respectively. A platform was built under a Python environment to predict the shape of single bubbles. Three algorithms were applied to perform the task which are KNN, random forest, and logistic regression. The input variables are Re number, Eo number, and We number, while the bubble shapes are labelled with number 1, 2, and 3 to represent the bubbles with elliptical shape, spherical-cap shape, and irregular shape, respectively [37]. Table 2 shows the distribution of the confusion matrix corresponding to the KNN model. Most of the predicted outputs fall accurately and precisely into their own category. In terms of recall rate, the true positive rate (TPR) and true negative rate (TNR) are the two important indicators. TPR value is 76.92% for the elliptical-shaped bubbles, 76.16% for spherical-cap shaped bubbles, and 86.11% for irregular-shaped bubbles. Meanwhile, the TNR value for the bubbles with elliptical shape, spherical-cap shape, and irregular shape is 85%, 76.10%, and 99.25%, respectively. Both the TPR and TNR values for each category are above 75%, indicating that the classifiers of the model perform well. In addition, the Kappa coefficient of the model is 87.55%, showing that the model has a considerably high degree of consistency. On the other hand, the distribution of confusion matrix for the logistic regression algorithm prediction model is presented in Table 3. Similar to the KNN algorithm prediction model, most of the predicted label falls into its own category accurately. From the point of view of recall rate, the elliptical-shaped bubbles have a TPR of 84.62%, spherical-cap shaped bubbles have a TPR of 87.41%, while irregular-shaped bubbles have a TPR of 94.44%. Meanwhile, for TNR value, it is 90.90% for the bubbles with elliptical shape, 87.5% for the bubbles with spherical-cap shape, and 99.25% for the bubbles with an irregular shape. The TPR and TNR values for all the categories are higher than 80%, indicating that each classifier of the model exhibits good performance. Moreover, the Kappa coefficient of the model is calculated and recorded at 78.28%. As suggested by Cohen [38], such a Kappa coefficient show a substantial agreement between the predicted and actual condition as it falls within the range of 61% to 80%. The distribution of confusion matrix for the random forest algorithm prediction model is contained in Table 4. The classification through the random forest algorithm displays that most of the predicted data located correctly and accurately in its own category. From the perspective of TPR, it is 87.18%, 86.09%, 83.33%, corresponding to bubbles with elliptical shape, spherical-cap shape, and irregular shape, respectively. Meanwhile, for TNR value, it shows 88.77% for the elliptical-shaped bubbles, 86.27% for spherical-cap shaped bubbles, and 98.16% for the irregular-shaped bubbles. Such high values of TPR and TNR represent that each of the classifiers of the model performs well. For the Kappa coefficient, the calculated value is 76.41%, which indicates that the classification accuracy of the model is substantially high [38].  Figure 14 shows the sensitivity of each variable in the model developed using the random forest algorithm, where P is the degree of importance. The formula of P is given by: where GI is Gini index, m is the number of features, GI i and GI r are Gini index of the two new nodes after branching. Basically, in accordance with the increase of Re, Eo, and We numbers, the bubbles experience a change from elliptical shape to either spherical shape or other irregular shapes. Due to the experimental constraints, only the limited range of the Re, Eo, and We numbers was applied in this study. Nonetheless, the relevant analyses were obtained via bubble phase diagram as proposed by Cliff [37]. In general, the single bubble motion in still water is mainly affected by viscous, gravitational, surface tension, and inertial forces. According to previous studies, the level of significance of different forces exerted on the bubbles varied with the size of bubbles. For instance, surface tension force exhibits the most significant impact on the bubbles with a diameter of less than 6mm, meanwhile, if the size is larger than 6 mm, the bubble is mainly affected by inertial force [1, 11,39].
As displayed in Figure 13, the effect of Eo number to the determination of the shape of single bubbles is the most significant, recording at P-value of 0.407. It is followed by Re number with a value of 0.34, and the least significant parameter is We number with a value of 0.253. This is mainly because Eo number is the parameter, which represents the effect of surface tension on motion characteristics of the single bubbles. The finding is in line with [1, 11,39], where the diameter of bubbles in this study ranges from 2 to 7.5 mm (mostly below 6mm) and thereby surface tension plays the role as the main force that affects the bubbles. The surface tension is particularly reflected through the dynamic pressure difference. In other words, when the dynamic pressure difference increases, the bubble surface tension changes drastically and complicates the changes of bubble shape. experimental constraints, only the limited range of the Re, Eo, and We numbers was applied in this 378 study. Nonetheless, the relevant analyses were obtained via bubble phase diagram as proposed by 379 Cliff [37]. In general, the single bubble motion in still water is mainly affected by viscous, 380 gravitational, surface tension, and inertial forces. According to previous studies, the level of 381 significance of different forces exerted on the bubbles varied with the size of bubbles. For instance, 382 surface tension force exhibits the most significant impact on the bubbles with a diameter of less than 383 6mm, meanwhile, if the size is larger than 6 mm, the bubble is mainly affected by inertial force 384 [1, 11,39].

385
As displayed in Figure 13, the effect of Eo number to the determination of the shape of single 386 bubbles is the most significant, recording at P-value of 0.407. It is followed by Re number with a 387 value of 0.34, and the least significant parameter is We number with a value of 0.253. This is mainly 388 because Eo number is the parameter, which represents the effect of surface tension on motion 389 characteristics of the single bubbles. The finding is in line with [1, 11,39], where the diameter of 390 bubbles in this study ranges from 2 to 7.5 mm (mostly below 6mm) and thereby surface tension 391 plays the role as the main force that affects the bubbles. The surface tension is particularly reflected 392 through the dynamic pressure difference. In other words, when the dynamic pressure difference  Figure 15 shows the distribution of bubble shapes with respect to different Re number and Eo 398 number conditions as predicted by KNN, random forest, and logistic regression prediction models.

399
Observing the shape distribution in Figure 15, when Eo falls within the range of 0.5 to 2 and the Re 400 number is in between 600-2,000, the bubble is elliptical in shape. When the Eo number ranges from 2 401 to 5 and the Re number is at a value of 900 to 2,000, the bubble is in spherical-cap shape. Meanwhile, 402 when the Eo number falls within the range of 4.5 to 8 and the Re number is recorded at a value of 403 2,000 to 5,000, the bubble is irregular in shape. Figure 14. Degree of importance for each parameter in the random forest prediction model. Figure 15 shows the distribution of bubble shapes with respect to different Re number and Eo number conditions as predicted by KNN, random forest, and logistic regression prediction models. Observing the shape distribution in Figure 15, when Eo falls within the range of 0.5 to 2 and the Re number is in between 600-2000, the bubble is elliptical in shape. When the Eo number ranges from 2 to 5 and the Re number is at a value of 900 to 2000, the bubble is in spherical-cap shape. Meanwhile, when the Eo number falls within the range of 4.5 to 8 and the Re number is recorded at a value of 2000 to 5000, the bubble is irregular in shape.

405
On the other hand, from Figure 15, it is noticed that, when Eo number is ranged from 0.7 to 2, the prediction results of the KNN algorithm show an overlap scenario. The same phenomenon happens when the Eo number ranges from 1 to 2 for the random forest algorithm.
Overall, based on the discussions under this section, the prediction model developed using logistic regression algorithm has achieved a better performance among the three examined models. Hence, it appears as the most suitable model for the bubble shape prediction.

Conclusions
The physical motion of single bubbles in the still water is the main focus of this study, as it plays a significant role in engineering applications, especially for marine engineering.
Since machine learning has been widely used in different fields of study, it is utilised in this research for both regression and classification problem-solving. The major aim of this study is to develop prediction models for the final velocity, drag coefficient, and shape of single bubbles.
For final velocity prediction model development, feed forward back propagation neural network was selected to establish the relationship between the dimensionless parameters (Re number, Eo number, and We number) and bubble final velocity. Such a model has achieved the R 2 value of 0.83 and RMSE value of 0.0518, indicating that the model has the ability to predict the bubble final velocity with a considerably high level of accuracy.
While developing the bubble drag coefficient prediction model, the FBNN approach was chosen. The dimensionless parameters (Re number, Eo number, and We number) were kept as the input and the bubble drag coefficient was reserved as the output. The R 2 and RMSE value of the model is 0.92 and 0.1534, respectively, meaning that the model has achieved a good prediction performance. It is found that the predicted values from the developed model have a higher accuracy if compared to the values from the empirical formulas proposed by Tomiyama (2002) and Dijkhuizen (2010).
On the other hand, KNN, logistic regression and random forest algorithm were used to develop the prediction model for bubble shape. The performance of each approach was evaluated and the logistic regression prediction model has achieved the best performance.
It is believed that the output models developed in this study can be beneficial to the engineers, scientists, and hydrologists while dealing with the problems related to the motion of single bubbles in still water.