Machine Learning Model of Dimensionless Numbers to Predict Flow Patterns and Droplet Characteristics for Two-Phase Digital Flows

: In the digital microfluidic experiments, the droplet characteristics and flow patterns are generally identified and predicted by the empirical methods, which are difficult to process a large amount of data mining. In addition, due to the existence of inevitable human invention, the incon-sistent judgment standards make the comparison between different experiments cumbersome and almost impossible. In this paper, we tried to use machine learning to build algorithms that could automatically identify, judge, and predict flow patterns and droplet characteristics, so that the empirical judgment was transferred to be an intelligent process. The difference on the usual machine learning algorithms, a generalized variable system was introduced to describe the different geometry configurations of the digital microfluidics. Specifically, Buckingham’s theorem had been adopted to obtain multiple groups of dimensionless numbers as the input variables of machine learning algorithms. Through the verification of the algorithms, the SVM and BPNN algorithms had classified and predicted the different flow patterns and droplet characteristics (the length and frequency) successfully. By comparing with the primitive parameters system, the dimensionless numbers system was superior in the predictive capability. The traditional dimensionless numbers selected for the machine learning algorithms should have physical meanings strongly rather than mathematical meanings. The machine learning algorithms applying the dimensionless numbers had declined the dimensionality of the system and the amount of computation and not lose the information of primitive parameters.


Introduction
The microfluidic technology has the advantages of generating micron or even nanometer droplets with uniform size, and it has been widely used in the fields of biology [1,2] chemistry [3], heat transfer [4], petroleum engineering [5], etc. Two-phase flow and three-phase flow [6] are common microfluids. Among them, countless papers have been focused on the flow-state modeling of microchannel two-phase flow, especially on the flow patterns and droplet characteristics that governed the droplet generation.
The prediction models [7] of flow pattern and droplet characteristics [8,9] (including the droplet length and frequency) had been established by the common method of multivariate regression analysis based on the primitive parameters in experiments. However, the physical mechanism of two-phase flow in microchannels was very complex, and the flow patterns and droplet characteristics were much sensitive to the experimental variables [10][11][12]. Thus, the prediction models were highly dependent on the experiments and could not be in extrapolation easily.
The machine learning (ML) using the algorithms to improve the accuracy of prediction model [13] had been introduced to replace the regression analysis by multivariate coupling in the microchannel two-phase flow. Timung and Mandal [14] established the ML model to predict the flow pattern of the two-phase based on the probabilistic neural network. This model input the different primitive parameters, e.g., the superficial velocity, property parameters, and geometry parameters, to predict the flow patterns in the circular microchannels (Table 1.a). The comparison of prediction accuracy between the empirical model and ML model was found that the ML model had a higher accuracy for the prediction of flow pattern. Nandagopal et al. [15] studied the angle of the Y-shaped microchannel (Table 1.b) and introduced the superficial velocity of two phases as the input variables. The ML model realized the prediction of flow patterns in the liquid-liquid two-phase flow. Al-Naser et al. [16] established the ML prediction model with the dimensionless Reynolds number and pressure drop multiplication factor that obtained the achievement to predict the flow patterns in a circular tube. Lin et al. [17] used the superficial velocity of two phases and inclination angles of the pipe as the input variables to predict the flow pattern of the air-water two-phase flow. Zhang et al. [18] used the ML algorithms to realize real-time flow regime identification. They found that the SVM algorithm had the best effect on the recognition of flow patterns.
Compared with lots of literature on the prediction of flow patterns using the ML models, little research had been performed to predict the droplet characteristics in the two-phase flow. At the aspect of ML, the prediction of flow patterns was a classification task, and the prediction of droplet characteristics was a regression task. Most ML models for the prediction of flow patterns used the primitive parameters directly from experiments to solve the classification task. However, for the prediction of droplet characteristics, the ML models using the primitive parameters were highly possible to fail the regression task because of the influence of dimensionality from the primitive parameters. The prediction of droplet characteristics by ML models should adopt dimensionless numbers for practice.
The different geometry parameters had been catalogued to be the co-flow (Table 1.a), Y/T-junction (Table 1.b), cross-junction (Table 1.c) and tapered co-flow (Table 1.d). There were multi-parameters to influence the results of flow patterns and droplet characteristics in the microchannel of two-phase digital flow. This paper cataloged all of them to be the primitive parameters, which were the geometry parameters, the process parameters, the property parameters, and the target parameters. In addition, it normalized the different geometry configurations to be the tapered co-flow model by adding two parameters of α and X (Table 1d). The implicit function equation had been established to describe the experiment system of two-phase digital flow. Then, Buckingham's theorem [19][20][21] was applied to transform the primitive parameters to the dimensionless numbers in the implicit function. Two-group experiments with different two-phase materials had been discussed in the ML model, which used the dimensionless numbers as the input variables. The SVM algorithm was to recognize the flow patterns, and the BPNN algorithm was to predict the droplet characteristics (droplet length Ld and droplet frequency f).

Methods and Data
The flow chart of the ML algorithms is shown in Figure 1. Firstly, the different geometry configurations in the two-phase digital flow had been normalized with the generalized primitive parameters system. Secondly, the implicit function equation had been described by the primitive parameters and then by the dimensionless numbers on the transformation of the Buckingham theorem. Thirdly, the ML model applied the dimensionless numbers as input variables to the different algorithms. Finally, the SVM [29] algorithm achieved the classification of flow patterns, and the BPNN [30] algorithm completed the prediction of droplet characteristics in the slug and dripping patterns.
Binkhonain and Zhao [31] pointed out that the SVM algorithm was more effective and widely used among the common classification algorithms in supervised learning. The BPNN algorithm solved the suitable results on the regression tasks in different fields [32][33][34].

Generalized Variable System to Diverse Geometry Configurations
The various configurations in the microchannel had been listed with the different geometry parameters (Table 1). It was necessary to establish a generalized variable system to describe the different geometry configurations uniformly.
The rectangular and circular cross-sections were the common shapes featured by W and h, and d, respectively. They were the equivalent physical quantities obeying the following equation.
Where d is the diameter of circular, W is the width of rectangular, and h is the height of rectangular.
Compared with other publication, our team proposed a novel normalized model to the microchannel based on the convergent coaxial (or tapered co-flow) structure [27,28].
In Table 1, for the generalized variable system, it was the rectangular microchannel possessed the 8 independent geometry parameters (Wc, hc, Wd, hd, Wout, hout, X, α) more than the circular microchannel with the 5 independent geometry parameters (dc, dd, dout, X, α). The circle microchannel degenerated from the rectangular microchannel and decreased the number of geometry parameters. The rectangular microchannel often was a universal form to describe the cross-section more properly.
All the primitive parameters in experiments consisted of two parts, the independent and target variables. The independent variables included the geometry, process, and property parameters and governed the target variables. The implicit function between the independent and target variables was in Equation 2.
Where geometry is geometry parameters, process is process parameters, property is property parameters, and target is droplet characteristics.
Besides the geometry parameters (Wc, hc, Wd, hd, Wout, X, α), other independent variables had different parameters, which were the process parameters of uc and ud, the property parameters of ρc, ρd, μc, μd and σ. As the target variables, it owned two parts of Ld and f. Taking all the above parameters into Equation 2, the implicit equation was reformed by 17 primitive parameters with 3 basic dimensions: M (mass), L (length), and T (time) in Equation3.
, ℎ , , ℎ , , ℎ , , , , , , , , , , , = 0 Due to the influence of dimensionality, the primitive parameters cannot be weighted directly, so that they should be converted to a dimensionless form. The Buckingham theorem, widely used in other fields [20,21], was a mathematical method to transform the dimensional quantity to the dimensionless quantity. Solving the basic solutions of Equation 3 by the Buckingham theorem, 17 primitive parameters with dimensionality had been transformed into 14 dimensionless numbers (π) in Equation4.
The above equation stated that the transformation from the primitive parameters to dimensionless numbers could decrease the dimensionality of the implicit function. Table 2 lists the three solutions of dimensionless numbers, which are Set1, Set2, and Set3. The numbers in Set1 and Set2 were the common hydrodynamic dimensionless numbers (Oh, We, Re, St, Su, Ca [35][36][37]) with a clear physical expression. On the contrary, in Set3, the target variable (π14) was not clear on the physical expression.
The input and output variables of ML algorithms should be chosen by these dimensionless numbers. As follows, this paper discussed the ML model accuracy in the three solution sets between the two-group experiments with different materials.

Input Variables Using Dimensionless Numbers or Primitive Parameters
In an ML model, the input and output variables should be determined first. In the three dimensionless solution sets, π13 and π14 were the outputs of the variable corresponding to Ld and f, respectively. In addition, the input variables of the algorithms were determined by π1~π12 to obey the rules. (1) They should have all specimen characteristics to represent the different experiments. (2) They should have less information redundancy to decrease the amount of computation. (3) Some of them should be ignored for the dimensionality reduction if they were invariable [38].
In order to testify the generalized variable system, this paper implemented 2 group experiments with the different two-phase flows, which were gas-liquid and liquid-liquid in the tapered co-flow microchannel. Each group experiment endured hundreds of individual experiments with different parameters. In Experiment 1 of the gas-liquid twophase flow, the continuous phase was the solution with sodium dodecyl sulfate aqueous, and the dispersed phase was argon. In Experiment 2 of liquid-liquid two-phase flow, the continuous phase was the lubricating oil, and the dispersed phase was the deionized water. The primitive parameters of gas-liquid two-phase flow in Experiment 1 were α，ρc， μc，σ，uc，ud, and the primitive parameters of liquid-liquid two-phase flow in Experiment 2 were X，α，uc，ud. Table 3 lists the four different sets for the ML model in 2 group experiments. There were the dimensionless numbers in Set1, Set2, and Set3 and the primitive parameters in Set4 as the input variables in the ML model. In Experiment 1, the number of input variables with the dimensionless numbers was less than 4, and it with the primitive parameters was 6. In Experiment 2, the number of input variables with dimensionless and primitive parameters was 4. The ML model brought the different input variables with the primitive parameters and dimensionless numbers to compare the recognition accuracy of flow patterns in the two-group experiments. How to choose the appropriate dimensionless numbers to descript the generalized system was also discussed by the prediction of droplet characteristics.
The four flow patterns were identified to be slug, dripping, jetting, and others in both Experiment 1 and 2 ( Figure 2). In fact, the droplet was the gas bubble in Experiment 1 and the liquid droplet in Experiment 2. Although the droplet was a dissimilar material, it had similar features in the generalized variable system. The slug, dripping, and jetting were the important flow patterns due to their formation generating the highly consistent droplets [39][40][41]. The ML model was applied to classify the four flow patterns and predict the droplet characteristics in the slug and dripping flow patterns due to enough data in experiments.

Recognition of Flow Patterns
The paper studied the ML model accuracy by inputting the primitive parameters and the dimensionless numbers. Before computing, all data from both Experiment 1 and 2 had been divided into the training and testing data by the ratio of 4:1, whatever they were in the solution sets with dimensionless numbers (Set1, Set2, and Set3) or primitive parameters (Set4). The training data were to establish the prediction model and the testing data were to verify the model accuracy. The accuracy of flow pattern prediction was the ratio of the number of predicted samples in the test set to the number of all samples.
After the data partition, the SVM algorithm recognized the four flow patterns with different accuracy in both experiments ( Figure 3). a b It can be seen from Figure 3a that all prediction models from the solution sets with both dimensionless and primitive parameters had a similar accuracy fluctuating within the range from 80.6% (Set1) to 86.6% (Set4) in Experiment 1. On three solution sets with dimensionless numbers, the prediction accuracy in Set1, Set2, and Set3 possessed 4, 3, and 4 dimensionless numbers as the input variables in Figure 3, respectively. Set4 with the primitive parameters had six input variables higher than those in Set1, Set2, and Set3.
It was interesting that all the prediction accuracies in the four solution sets were very close. This indicated that, for the recognition of flow patterns, the SVM algorithm inputting the dimensionless numbers minimized the specimen information loss, reduced the amount of computation, and maintained the model accuracy simultaneously. Figure 3b gives similar results in Experiment 2. The recognition accuracy is nearly consistent, about 93% in all solution sets. All dimensions of input variables are the same (4 dimensions) in both dimensionless and primitive parameters. Compared with Experiment 1, the recognition accuracy of flow patterns in Experiment 2 is higher, about 10%. Figure 3 also proves that the recognition of flow patterns with dimensionless numbers has not decreased the model accuracy and not lost information compared with primitive parameters. Figure 4 reveals the recognition accuracy difference between Experiment 1 and Experiment 2, taking Set2, for example. Set2 has 3 dimensionless numbers of π7, π10, and π12 (Table 3)  In Experiment 2, the dimensionless numbers of π6,π7,π10, and π12 (Table 3)  Back to the classification task, the distinguished boundary between different data often benefits to realize the classification solution easily. Herein, the data distribution separated by the boundaries was much clear to improve the recognition accuracy significantly in Experiment 2. Figure 4 illustrates that the difference in prediction accuracy between Experiment 1 and Experiment 2 can be attributed to the clearance of data distribution.

Prediction of Droplet Characteristics
The primitive parameters could not be used as the input variables to predict the droplet characteristics since they would make the prediction model lose the physical meaning. Therefore, this paper only discussed the different solution sets with dimensionless numbers to predict the droplet characteristics.
The data sets of Experiment 1 and 2 had been divided into two parts, Region A and Region B, with the ratio of about 4:1, in which the data in Region A and Region B were used to establish and verify the prediction model. In Region A, the data were further divided into the training data and testing data with the ratio of 4:1.
The data partition of slug and dripping patterns in Set1 was separated by the line with Wed=0.02 in Experiment 1 (Table 4a, b). The data in Region B was above this line, inversely the data in Region A was below it. Similarly, the data partition of slug and dripping patterns in Set2 and Set3 were separated by the line with Wed=0.02 and Cad=0.002, respectively (Table 4c~f).  Table 5 is the data partition of slug and dripping patterns in three sets in Experiment 2. The partition line adopted the same parameters following it in Experiment 1.  The BPNN algorithm was guided to predict the droplet characteristics, which were π13 and π14. Before the data inputting, the normalized x'∈ (0,1) was identified to increase the convergence of the algorithm (Equation 5).

(5)
Where x is the number value, x' is the normalized number value, and xmax and xmin are maximum and minimum values of x, respectively.
After normalization, the data in Region A had been input to the BPNN algorithm establishing the prediction model of droplet characteristics in different sets. Then the data in Region B were input to the prediction model predicting π13 and π14 and getting the result the parameters (Ld and f) by inversed operation. Table 6 shows the model errors of slug and dripping patterns predicted by three sets in Experiment 1 and Experiment 2. The errors were obtained by comparing the predicted value with the measured value in all samples from Region B.  Table 6a,b. Compared with the measured values, the prediction model of droplet length Ld and droplet frequency f in slug pattern occupied the error range within ± 20% both in Set1 and Set2, and more than ± 20% in Set3 (Table 6a). The same phenomenon was found in the dripping pattern (Table 6b).
The model errors of droplet characteristics, Ld and f, in slug and dripping patterns in Experiment 2 were shown in Table 6c, d. It was clear that, compared with the measured values, the model errors of length Ld and frequency f in slug droplet were in the range of ± 20% regardless of Set1, Set2, and Set3 (Table 6c). Similarly, the prediction model occupied the error in ± 20% comparing with the measured values in dripping pattern in all three sets (Table 6d).
It was true that the model error of Set3 was worse than it of Set1 and Set2 in Experiment 1; however, that of Set1, Set2, and Set3 in Experiment 2 was similar. The accuracy difference of the prediction model among the three sets was mainly expressed by the relationship between dimensionless numbers and their physical meanings.
In all three sets, the predicted parameters of Ld and f were mapped by the dimensionless numbers of π13 and π14. It was obvious that the accuracy difference with slug and dripping patterns in different experiments had been strongly related to the values of π13 and π14 (Figures 5 and 6). In the slug and dripping patterns of Experiment 1, the order of magnitude of π13 value changed from 10 -3 to 10 0 in Set1 and Set2 and from 10 -4 to 10 0 in Set3. There was one order of magnitude gap of π13 value between Set1, Set2, and Set3 (Figure 5a). Similarly, the order of magnitude of π14 value changed from 10 -3 to 10 0 in Set1 and Set2 and from 10 -7 to 10 0 in Set3. There was a four order of magnitude gap of π14 value between Set1, Set2, and Set3 (Figure 5b). It demonstrated that the dimensionless numbers of π13 and π14 in Set3 had a widely separate distribution than those in Set1 and Set2, which led to the accuracy decline for the prediction model. Figure 6 shows the data distribution of π13 and π14 in Experiment 2. All the values of π13 and π14 in three sets had the order of magnitude range from 10 -3 to 10 0 without the distinguish gap among them. As a result, the prediction accuracy of droplet characteristics with three sets in Experiment 2 was a similar value, and this was higher than it in Experiment 1. This also proved that the smaller range of dimensionless numbers distribution could improve the model accuracy significantly.
It should be emphasized that the solutions were not only one set but also infinite sets by using Buckingham's theorem transforming the primitive variable equation to a dimensionless variable equation. Although all solution sets were mathematically equivalent, the different solution sets described the same physical system in different ways. In the three sets, π14 had three different forms (Set1: , Set2: (St), Set3: ) to correspond to the same physical system. The expression of π14 in Set1 and Set2 was traditional hydromechanics dimensionless numbers stated by literature in Table 2, and it in Set3 was not common to describe the physical system clearly. It was known that Set3 was a random solution from the infinite solutions. To achieve higher prediction accuracy, the solution set selected from the dimensionless implicit function should be in the physical meaning strongly instead of considering the mathematical meaning only. The range of data distribution could be used as the criterion to judge whether the solution set had the stronger physical meaning in the ML model with dimensionless numbers.

Conclusions
This paper introduced a generalized variable system to describe the different geometry configurations of the microchannel. Buckingham's theorem transformed the primitive parameters into dimensionless numbers using the implicit function equation. Taking the dimensionless numbers to input the machine learning model, the SVM algorithm had recognized the flow patterns, and the BPNN algorithm predicted the droplet characteristics.
The model accuracy of flow patterns with dimensionless or primitive parameters was almost the same by the SVM algorithm in the same group of experiments. There was an accuracy variation between the two-group experiments, and its main reason was that the boundaries between the adjacent flow patterns were not clear and crossed each other. According to the primitive parameters, the prediction model applying the dimensionless numbers had reduced the dimensionality of input variables, saved the computing time, and maintained the model accuracy.
The infinite solutions of dimensionless numbers were equivalent in the mathematical meaning. However, the prediction accuracy of droplet characteristics was not the same in the different solution sets. It was explained directly that the discrete data of dimensionless numbers (π13 and π14) had significant and negative effects on the model accuracy based on the two-group experiments. Furthermore, all solution sets with the equivalent mathematical meaning were not in the same physical meaning. In order to achieve higher prediction accuracy, the solution set selection should be in physical meaning strongly, and the range of data distribution should be used narrowly.
We had been established the normalized model for the two-phase flow in this paper. However, how to use the model to the multiphase flow (e.g., three-phase flow) correctly is still necessary to be researched further by improving the model. On the other hand, the experiments in this paper only supported the normalized geometry configurations of the model. As the experiment material changes, the effects of physical parameters should be considered carefully in the implicit function.  Institutional Review Board Statement: The study did not involve humans or animals.

Informed Consent Statement:
The study did not involve humans or animals.

Data Availability Statement:
The study did not report any data.

Conflicts of Interest:
The authors declare no conflicts of interest.