Modeling of the Drag Force in Polydisperse Gas–Solid Flow via an Efﬁcient Supervised Machine Learning Approach

: Most granular ﬂow in nature and industrial processing has the property of polydispersity, whereas we are always restricted to using the monodisperse drag force model in simulations since the drag force model with polydispersity is difﬁcult to establish. Ignoring polydispersity often results in obvious deviations between simulation and experimental outcomes. Generally, it is very hard for us to describe the characteristics of polydispersity in drag force by using a function with analytic expression. Recently, the artiﬁcial neural network (ANN) model provides us the advantages of estimating these kinds of outcomes with better accuracy. In this work, the ANN is adopted to model the drag force in polydisperse granular ﬂows. In order to construct a reasonable ANN algorithm for modeling the polydisperse drag force, the structures of ANN are elaborately designed. As training for the ANN drag model, a direct numerical simulation method is proposed, based on the lattice Boltzmann method (LBM), to generate the training data, and an adaptive data ﬁltering algorithm, termed as the optimal contribution rate algorithm (OCRA), is introduced to effectively improve the training efﬁciency and avoid the over-ﬁtting problems. The results support that the polydispersity of the system can be well scaled by the ANN drag model in a relatively wide range of particle concentrations, and the predicted results coincide well with the experimental ones. Moreover, the ANN drag model is not only effective for polydisperse systems, but compatible with monodisperse systems, which is impossible using traditional drag models.

In recent years, there have been many important studies on the drag force models of polydisperse granular systems. In 2005, van der Hoef et al. [24] extended the monodisperse drag force model of Hill et al. [25] to polydisperse granular systems by introducing a corrected volume fraction, a diameter fraction, and the Sauter mean diameter. In 2006, Benyahia et al. [26] extended the applicability for the Hill et al. formula [25] in a wider range of the Reynolds number. In 2007, Beetstra et al. [27] indicated that the linear dependence on the Reynolds number in the Ergun drag force model [28] is insufficient, and revised the model subsequently by using the direct simulation data that were obtained by the lattice incompressible flow compared to their previous work [41]. Whereas their drag model still mainly focused on dilute polydisperse flows, the applicability in a dense monodisperse granular flow has not been verified yet. Beyond that, there are still other works associated with the drag force based on ANN [42][43][44].
The above studies took full advantage of ANN in self-learning and high compatibility during modeling of the drag force. Compared with the classical mathematical analysis approaches, the ANN can learn complex nonlinear relationships from the datasets spontaneously. It has the advantages of flexible structure, simple algorithm, self-learning ability, and good compatibility [45]. The strong predictive capabilities of ANN in modeling the drag force have been well demonstrated in the present studies. However, the models given by these studies still have three imperfections. First, these polydisperse models are applicable in the relatively dilute low-intermediate Reynolds number flow systems; they can scarcely be adopted in a relatively dense granular system. Second, these polydisperse drag models are hardly compatible with a monodisperse granular system since the training samples were mainly from the direct simulations of polydisperse granular systems. Third, the polydisperse drag model suffers from low computational efficiency and over-fitting problems because of the huge training sample set; these studies rarely focus on the training strategy to improve the computational efficiency and avoid over-fitting problems. So, the primary objective of this paper is to extend the applicability of the ANN drag model and improve its computational efficiency. The original contribution of this paper can be drawn in three points. To start with, the drag model with wide applicability in a solid fraction is proposed by using ANN techniques. Next, the proposed ANN drag model is not only applicable in a polydisperse granular system, but has good prediction accuracy in a monodisperse granular system. Finally, a new adaptive data filtering algorithm based on the conception of a contribution rate to improve the training efficiency and avoid over-fitting problems is proposed, so that the redundant samples and singular samples are automatically eliminated from the training set.
The paper is organized as follows. In Section 2, we introduce the numerical method for generating the dataset and how to split it. In Section 3, the structure and parameters of the ANN for the polydisperse drag model are discussed, and an efficient and accurate algorithm that has the adaptive characteristics for training the ANN drag model is proposed. Subsequently, in Section 4, the developed model is compared with the Beestra et al. (BVK) [27], Cello et al. [21], and Holloway and Sundaresan (HS) formula [30] to demonstrate the validity, feasibility, and practicability of the model. Finally, some concluding remarks are given in Section 5.

Numerical Method
LBM, as a mesoscopic method, can be considered as a numerical method independent of continuous kinetic theory. Compared with traditional macroscopic numerical methods, LBM has the advantages of easy coding implementation [46], good parallelism, easily handled geometrically complicated boundaries, and accurately describing the macroscopic motion phenomenon between gas-solid two phases at the microscopic level [46,47]. Therefore, LBM is suitable for the polydisperse system of granular flow by considering its computational accuracy and efficiency of large complex fluid systems.
So far, LBM is widely adopted as a solver to describe the characteristics of granular flow in which rigid granules are immersed in the continuum Newtonian fluid. In this context, we start with the Navier-Stokes equation, which is described by the lattice Boltzmann equation [48] as: where f α (x, t) is the distribution function with discrete velocity e α at a position x and a time t, δt is the time step size, n is the total number of discrete velocities, and τ is the relaxation  [48] can be described as: where ω α is the model-dependent weight coefficient, c s = RT (R is the gas constant) is the lattice sound velocity. The fluid density ρ and velocity u can be derived as the zero-th and first order moments of f α , respectively, as: The fluid pressure is defined directly as p = c 2 s ρ. Through the Chapman-Enskog expansion, the Navier-Stokes equations can be obtained with the viscosity [49] given by: According to the definition of the Reynolds number Re = ρUL/υ, the relaxation time τ [50] is calculated by: where U and L are the characteristic velocity and length, respectively. To obtain the drag force of a polydisperse system of granular flow, the structure of a polydisperse granular cluster in the flow field was randomly generated. Regarding the boundary condition, our model was referred to the LBM model of Chen et al. [51]. The inlet boundary condition of the fluid field was set to the constant velocity, which was U = 0.1 m/s. At the outlet boundary, a stress-free condition was set, and the periodic boundary conditions were applied to the other two directions. For these settings, as shown in Figure 1, the flows horizontally from left to right with a steady state were gradually achieved, which means the flow field reaches the equilibrium state at each moment. Considering our previous study [23], a 300 100 100 × × grid was employed, and a singlerelaxation-time collision operator in conjunction with a D3Q19 lattice was adopted for the accuracy and stability of the model. In order to improve the computational efficiency of a polydisperse system, a parallel algorithm (MPI+LBM) was used to implement the numerical simulation in the flow field. An in-house C++ LBM solver developed by our research group was used, and the parameters of direct numerical simulation (DNS) [52] are listed in Table 1. Considering our previous study [23], a 300 × 100 × 100 grid was employed, and a single-relaxation-time collision operator in conjunction with a D3Q19 lattice was adopted for the accuracy and stability of the model. In order to improve the computational efficiency of a polydisperse system, a parallel algorithm (MPI+LBM) was used to implement the numerical simulation in the flow field. An in-house C++ LBM solver developed by our research group was used, and the parameters of direct numerical simulation (DNS) [52] are listed in Table 1. Given the advantages of a simple calculation process, ease of programming, and good prediction accuracy in particle-suspended flow, the momentum exchange method [53] is adopted to calculate the drag force. Specifically, the drag force that acts on the solid surface is obtained by calculating the momentum exchange on each boundary link between each two opposing directions of the neighboring nodes as: where x f , x f + e i δ t are the fluid node and its neighboring solid node, respectively, and e i = −e i . For training the ANN, the particle Reynolds number Re p [20] is calculated by where with v g is the gas instantaneous velocity in the position of an individual granule, v s is the solid velocity; in the simulation, the solid velocity v s was set to zero. ε s is the solid volume fraction. < d > is the Sauter mean diameter [54], which is defined as: The solid volume fraction [20] is the ratio of the volume of the granular cluster V gc to the grid volume occupied by the granular cluster V grid and can be represented as: and the void fraction ε g as: Table 2 presents the average drag force of the granules calculated by the LBM model when the void fraction ε g = 0.5. Moreover, the scaling of the dimensionless average drag coefficient is the drag force acting on a single granule in the flow field under Re p = 1. The diameter of a single granule is 12∆ x . After comparison, the simulation results are consistent with the published literature [19].

Dataset Splitting
In the 3-dimensional (3D) LBM model, 450 granules were randomly generated per simulation, and approximately 430 simulations were conducted. Then, 80% of the total 243,260 training data with Re = 0.01 to 1100 is obtained in the simulation. The information for each granule after simulation will be grouped in a set ε s , u slip , < d >, Re p , F d . Overall, 17% of the total training data is acquired from Ref. [24] for a monodisperse system with Re ∈ [10,1100], and the other 3% is obtained from Ref. [25] for a monodisperse system with Re ∈ [0.01, 10].
The dataset is composed of the samples for the monodisperse system and polydisperse system of granules. The void fraction range of the granular system is [0.35, 0.95]. To ensure a large diversity of shapes, 5 sizes of granules (see Table 1) are randomly distributed in Re ∈ [0.01, 1100] and ε g ∈ [0.35, 0.95], then the dataset is systematically divided into 3 subsets: 70% of samples for the training set, 15% of samples for the validation set, and 15% of samples for the testing set. If the ANN model appears to have over-fitting problems during the prediction step, a filtering algorithm will be adopted to avoid it.
The training dataset is used for training the ANN model, the validation dataset is for fine-tuning the hyperparameters of the model, [55] and the testing dataset is for evaluating the trained model performance. It should be noted that there are no intersections among the three datasets, which ensures that the testing dataset will reflect the generalization performance of the model on invisible data.

The Construction of ANN
A proper design of the ANN structure can greatly improve the computing efficiency and prediction accuracy of the neural network. In order to obtain a reasonable ANN frame for modeling the drag force, the learning algorithm, activation function, hidden layers, and input variables of ANN will be discussed based on the TensorFlow to build the patterning drag model.
As the 243,260 training samples are the local drag forces for each individual granule, it is worth noting that with the changes of sampling location and time, the local drag force will be changed obviously. Moreover, the mesoscale heterogeneity, particle motion characteristics, fluid-solid density ratio, and other factors may affect the simulation results substantially. Thus, it is necessary to preprocess local drag force data by statistical averaging Appl. Sci. 2023, 13, 8086 7 of 27 methods, hoping to eliminate the influence of statistical fluctuation that is caused by incomplete consideration of influencing factors.
Due to the particle Reynolds number Re p , the solid volume fraction ε s and slip velocity u slip , which are used to characterize the movement characteristics of particles, have clear physical meaning and an explicit relationship with drag force [19]. Thus, the above variables will be chosen as the inputs of the ANN drag model. Moreover, the Sauter mean diameter < d > is selected as the input, considering the description of the polydisperse characteristics in the ANN drag model. The input-output variables relationship of the ANN is illustrated in Figure 2. Since the gradient descent with momentum (GDM) algorithm and the Levenberg-Marquardt (LM) algorithm are the commonly used learning algorithms in ANN [45], we decided to compare their performance on the convergence of ANN. Two termination rules, "rule A", which gives the maximum number of iterations, and "rule B", which is the given error tolerance, are adopted simultaneously. To reduce the impact of the random initialization of weights, we conducted three independent experiments and recorded the mathematical expectation of the Spearman correlation coefficient [57] between the outputs and the expected outputs in Table 3. It can be found that LM has more accurate prediction results in terms of convergence and applicability compared to the GDM algorithm. However, GDM has advantages of computational efficiency in simulations. Considering the principle of precision priority in designing the ANN, the LM algorithm will be adopted as the learning algorithm of ANN.

The Learning Algorithm of ANN Drag Model
Since the gradient descent with momentum (GDM) algorithm and the Levenberg-Marquardt (LM) algorithm are the commonly used learning algorithms in ANN [45], we decided to compare their performance on the convergence of ANN. Two termination rules, "rule A", which gives the maximum number of iterations, and "rule B", which is the given error tolerance, are adopted simultaneously. To reduce the impact of the random initialization of weights, we conducted three independent experiments and recorded the mathematical expectation of the Spearman correlation coefficient [57] between the outputs and the expected outputs in Table 3. It can be found that LM has more accurate prediction results in terms of convergence and applicability compared to the GDM algorithm. However, GDM has advantages of computational efficiency in simulations. Considering the principle of precision priority in designing the ANN, the LM algorithm will be adopted as the learning algorithm of ANN.

The Activation Function
Different activation functions have various ranges of influence on the prediction accuracy of the ANN; thus, it is necessary to discuss the performance of commonly used activation functions, including Sigmoid, Tanh, Leaky-LU, and Gauss. The predicted results of ANN with a single hidden layer under three independent experiments are illustrated in Table 4. There is an obvious difference in computational accuracy among the four activation functions. Specifically, the Leaky-LU function demonstrates the lowest prediction accuracy compared to the Tanh function and the Sigmoid function. The prediction results of the Gauss function for the three independent experiments show significant differences. Considering the characteristics of the Gauss activation function, this may be caused by its local response property [45]. Therefore, it is suggested to avoid using it to build the polydisperse drag force model. Moreover, the Sigmoid and Tanh activation functions have a similar prediction accuracy, and both belong to the so-called "S-function". Because the Tanh function shows a symmetry property with the origin point in the Cartesian coordinate system, it will be beneficial to be utilized for preserving the original properties of the parameters [45]. Consequently, it is more reasonable to use the Tanh function as the activation function in modeling the drag force.

The Structure of Hidden Layers
The structure of the hidden layers is an important factor that influences the ANN performance. Montavon et al. [45] have proved that a one-hidden-layer back-propagation (BP) neural network can realize arbitrary nonlinear mapping. Thus, we will first discuss the influence of the number of hidden layers on the convergence speed and accuracy of the neural network with a multi-hidden-layer structure, as shown in Figure 2.
The independent experimental results of an ANN with one, two, three, and four hidden layers are given in Table 5. It is found that the calculation time and number of iterations increase as the number of hidden layers increases. Accordingly, considering the prediction accuracy and efficiency of ANN, the single hidden layer is preferred in modeling the drag force. Next, we will discuss the influence of the structure and the number of neurons in the single hidden layer of the ANN on the convergence speed, correlation coefficient, and CPU time. We took the average results of three independent experiments and recorded them in Table 6. It can be found that considering the influence of the number of neurons on the correlation coefficient and CPU time, a single-hidden-layer BP neural network with 15 neurons is suggested to be implemented for the current study. Moreover, the impacts of input and output correlations for ANN are illustrated in Appendix A. Finally, the conclusion can be drawn that the Levenberg-Marquardt (LM) algorithm is recommended as the learning method, and a single hidden layer with a 15-neurons back-propagation (BP) neural network attached to the Tanh activation function is suggested during the construction of the polydisperse granular drag force model.

The Optimal Contribution Rate Algorithm (OCRA)
The ability of ANN to explore the correlation between variables is not limitless [45], and the predicted results depend significantly on the correlation between inputs and outputs. When the input samples corresponding to the output variable are incomplete and the relationship between them is not clear, it will bring great challenges to the ANN. Thus, an additional preprocessing method may be required when the correlation between the inputs and outputs is found to be ambiguous.
In the applications of the ANN-machine learning model, various fitting problems are encountered due to diverse functional realizations. In order to complete the prediction of some complex nonlinear relationships, a huge training sample set is required to construct a precise ANN prediction model. However, the huge training sample set will bring a significant reduction in the training and computing efficiency of the ANN. The reason is that according to the characteristics of the ANN-machine learning model, it is prone to result in repeated and inefficient training for ANN by using the linearly correlated training samples (i.e., inefficient training samples). Meanwhile, the training samples with the property of mutual independence (i.e., efficient training samples) are more preferable [45]. Considering that the huge training sample set for the complex nonlinear relationships contains the efficient and inefficient training samples, it is necessary to design an algorithm that can identify the different types of training samples and search for the most efficient one for training. For this purpose, we will propose an adaptive data filtering algorithm, i.e., the OCRA. The implementation of the algorithm can be divided into two steps. The first step is to analyze the contribution rate of the training samples inspired by the idea of the kernel principal component analysis (KPCA) [58]. The second step is to search for an optimal training sample set adaptively with the best contribution rate according to the certain tolerance of the training error.

Contribution Rate Analysis Algorithm
We assume that the input samples in the training model of ANN are represented by: where x j denotes the j-th input sample where x i,j represents the i-th index of the j-th input sample. The process of obtaining the data contribution rate and the cumulative contribution rate of training samples is divided into four steps: (I) Considering the logarithmic characteristics of drag distribution in the polydisperse gas-solid system (see Table 2) and the following contribution rate analysis, in which the input data are required by principal component analysis to maintain a standardized and linear property [58,59], we propose a kernel function to process the input samples as follows: In the above formula, X ij is the j-th sample of the i-th index after that the original matrix X ij was standardized and linearized. M i is the statistical average of the i-th sample, and S i is the standard deviation of the i-th sample. (II) To explore the independence between training samples, we need to construct the covariance matrix of the standardized sample set [57] as: r 11 r 12 · · · r 1n r 21 r 22 · · · r 2n · · · · · · · · · · · · r m1 r m2 · · · r mn     . where is a diagonal matrix with the eigenvalues λ 1 , λ 2 , · · ·, λ n of matrix R. (IV) Finally, the contribution rate and the cumulative contribution rate of the training sample and training sample set will be calculated by Through the above four steps, we can obtain the contribution rate and the cumulative contribution rate of the training sample and training sample set.

Adaptive Searching of Optimal Contribution Rate
In the previous section, we have introduced the concept of the contribution rate of the training samples. To effectively improve the training efficiency in the certain error tolerance, it needs to analyze the training sample set and use the most valuable information to train the ANN. Meanwhile, some samples with a low contribution rate need to be excluded from the training. In view of these, an adaptive searching method of the optimal contribution rate for the training dataset needs to be designed; it is hoped to maximize the computational efficiency of the ANN within the given error tolerance of the system.
To establish the adaptive searching algorithm of the optimal contribution rate in the training dataset, the basic idea of the bisection algorithm for solving nonlinear equations is employed. The algorithm can be divided into four steps: (I) First, set the error tolerance of the system as: Since the mean square error (MSE) has the approximate growing trend with the mean absolute error (MAE) and mean relative error (MRE) in the data fitting, and can better demonstrate the degree of errors dispersion of the predicted results, the MSE is adopted as the standard of error. K denotes the coefficient of error, and E MSE,total , E MAE,total , and E MRE,total denote the MSE, MAE, and MRE of the system [57], respectively. Their mathematical expressions are given as follows: (II) The contribution rate interval [0, 100] is divided into 2 equivalent intervals: [0, 50] and [50,100]. If the E MSE at the midpoint 50 meets the given error tolerance (0, K · E MSE,total ), the left interval [0, 50] will be selected; otherwise, the right interval [50,100] will be selected.
(III) It will determine whether the difference between the right end of the interval and the left end of the interval meets the accuracy of the solution. If the accuracy of the solution is met, go to step (IV); otherwise, it will return to step (II) to divide the interval again until the accuracy of the solution is met.
(IV) Output the midpoint of the interval that meets the accuracy. In Figure 3, I low denotes the left end of the contribution rate interval, I high denotes the right end of the contribution rate interval, I mid denotes the midpoint of the contribution rate interval, K denotes the coefficient of error, E MSE,total denotes the MSE of the system, and E MSE,mid denotes the MSE of the midpoint in the contribution rate interval. One of the main advantages of the OCRA is the adaptability. This algorithm can identify the training sample set with the optimal contribution rate, where the given error tolerance of the system is met automatically.
The verification that the system can identify the optimal contribution sample set automatically via the above-mentioned four steps will be demonstrated in the next section. contribution rate interval, K denotes the coefficient of error, MSE,total E denotes the MSE of the system, and MSE,mid E denotes the MSE of the midpoint in the contribution rate interval. One of the main advantages of the OCRA is the adaptability. This algorithm can identify the training sample set with the optimal contribution rate, where the given error tolerance of the system is met automatically. The verification that the system can identify the optimal contribution sample set automatically via the above-mentioned four steps will be demonstrated in the next section.

Drag Model without OCRA
The learning process of the ANN can be regarded as the process of finding the optimal solution in the given error tolerance of the system. The better compatibility and selflearning capabilities of ANN, which are different from traditional regression methods, are

Drag Model without OCRA
The learning process of the ANN can be regarded as the process of finding the optimal solution in the given error tolerance of the system. The better compatibility and self-learning capabilities of ANN, which are different from traditional regression methods, are promising to obtain the implicit connections and relationships among variables under different granular sizes. Because the difference of granular size will bring obvious heterogeneity in the granular flow, the drag force model of a polydisperse system has complex and implicit nonlinear characteristics. Generally, it has great advantages for the application of ANN to establish the polydisperse drag model.
Since a small training sample set will produce large deviations for ANN training, it is not reasonable to use a small training sample set in the training process. Meanwhile, the training sample set should not be set too large either, considering the computational efficiency and over-fitting problems. In order to find a reasonable training sample set, we start with 990 samples and increase samples with the interval size of δ Re p = 100 and δ ε s = 0.15 to search for the optimal number of training samples in each interval [Re p,k , Re p,k+1 ] × [ε s,l , ε s,l+1 ] (k = 1, 2, · · ·, 11, l = 1, 2, · · ·, 4). The specific searching process and results are illustrated in Appendix B.
After preprocessing the training sample set, a single hidden layer with a 15-neuron BP neural network is used to fit the training data. The LM algorithm and Tanh function are used as the learning method and activation function, respectively, in addition to the input and output variables, which are shown in Figure 2. Since the prediction results are similar, we only show the drag force of granules of different sizes when ε g = 0.35. Figure 4 presents the neural network; in general, it can accurately describe the drag force of the polydisperse granular system. Yet, it can still be shown that there is an accuracy difference in the prediction of different-sized granules. From the prediction results, ANN has good prediction accuracy for larger-sized granules (see in Figure 4c,d), but it produces certain deviations for smaller granules, and these deviations will become more obvious in a larger Reynolds number flow system. The main reason is perhaps that the motion of smaller granules is prone to be influenced by larger granules in polydisperse granular flow, and it will lead to irregular features for a smaller granule's motion that will bring drastic fluctuations in the granular drag force. Furthermore, some smaller granules that are distributed with larger ones will result in heterogeneous distributions. This distribution will cause drastic complex nonlinear relationships in the drag force distribution. Thus, the complicated non-smooth functional relationship between drag force and other variables will appear. On the other hand, the ANN model can achieve more accurate predictions for smooth and flat functional relationships, whereas when ANN encounters the input and output fitting samples with complicated non-smooth functional relationships, it may produce a certain deviation, and this deviation becomes larger as the complex functional relationship becomes stronger. Thus, there will be certain deviations in the prediction of the drag force for smaller granules. Toward these deviations, it is difficult to eliminate them by only considering commonly used inputs in computation, including the mathematical expectations and variance of granular diameters. Thus, it is worth conducting more indepth studies in searching for the more efficient inputs of ANN in future work.

Adaptive Advantages of OCRA
To verify that the OCRA can find the most valuable training samples adaptively under the given number of samples, we first randomly select 1000, 1300, and 1600 samples without OCRA for ANN training. These samples are named random training samples, and then 1000 training samples by using the OCRA, which are named adaptive training samples, are employed to train the ANN. The conditions of the numerical experiment are the same as in Section 4.1. The prediction accuracy and CPU time of different training samples in ANN prediction for ε g = 0.35 are demonstrated in Figure 5 and Table 7.
As shown in Figure 5, the training error of ANN at Re p ∈ [0.008, 1100] decreases obviously when the number of training samples increases from 1000 to 1300. However, when the training samples increase from 1300 to 1600, the prediction error of the ANN drag model decreases a little. It can be concluded that the effective samples account for a large proportion in the training samples from 1000 to 1300, so it makes the training error of ANN decrease rapidly. However, in the training samples from 1300 to 1600, the ineffective samples, which are redundant and singular samples, account for the mainstream; these samples have no obvious effect on the improvement of the training accuracy and even reduce the training accuracy of the ANN. The results demonstrate that a large original training sample set will inevitably have some redundant and singular samples, and these "unusual samples" will cause repeat training and even over-fitting problems of the ANN in a certain prediction zone. It still can be inferred from Figure 5 that the prediction accuracy of 1000 samples by using the OCRA is close to that of 1300 random samples, which is much higher than the accuracy of 1000 random samples. Yet, the CPU consumed time for ANN training by 1000 adaptive samples, as shown in Table 7, is 21% less than that of 1300 random training samples. The conclusion can be drawn that the OCRA can find the most valuable training samples for the improvement of training efficiency in the ANN drag model. Simultaneously, a certain number of low-contribution-rate training samples are excluded by this adaptive algorithm. As a matter of fact, the computational efficiency of the ANN drag model can be improved obviously. In general, the OCRA is an effective tool to exclude the redundant and singular samples in ANN training.

Adaptive Advantages of OCRA
To verify that the OCRA can find the most valuable training samples adaptively under the given number of samples, we first randomly select 1000, 1300, and 1600 samples without OCRA for ANN training.     As shown in Figure 5, the training error of ANN at [0.008,1100] ∈ p Re decreases obviously when the number of training samples increases from 1000 to 1300. However, when the training samples increase from 1300 to 1600, the prediction error of the ANN drag model decreases a little. It can be concluded that the effective samples account for a large proportion in the training samples from 1000 to 1300, so it makes the training error of ANN decrease rapidly. However, in the training samples from 1300 to 1600, the ineffective samples, which are redundant and singular samples, account for the mainstream; these samples have no obvious effect on the improvement of the training accuracy and even reduce the training accuracy of the ANN. The results demonstrate that a large original training sample set will inevitably have some redundant and singular samples, and these "unusual samples" will cause repeat training and even over-fitting problems of the ANN in a certain prediction zone. It still can be inferred from Figure 5 that the prediction accuracy of 1000 samples by using the OCRA is close to that of 1300 random samples, which is much higher than the accuracy of 1000 random samples. Yet, the CPU consumed time for ANN training by 1000 adaptive samples, as shown in Table 7, is 21% less than that of 1300 random training samples. The conclusion can be drawn that the OCRA can find the most valuable training samples for the improvement of training efficiency in the ANN drag model. Simultaneously, a certain number of low-contribution-rate training samples are excluded by this adaptive algorithm. As a matter of fact, the computational efficiency of the ANN drag model can be improved obviously. In general, the OCRA is an effective tool to exclude the redundant and singular samples in ANN training.

Drag Model with OCRA
So far, we have established an ANN drag model of a polydisperse granular system in Section 4.1. In order to verify the feasibility of OCRA in the ANN drag model, the ANN drag model with OCRA will be established, and its predicted results will be analyzed in this section.
First, the prediction accuracy of ANN with the OCRA needs to be explored. We use 1290 random training samples and 1000 adaptive training samples to train the ANN and ANN with OCRA, respectively. Then, the two drag models will be employed to predict the dimensionless drag force of a polydisperse granular system. The experimental conditions are the same as in Section 4.1, and the weight matrices and biases from the input layer to the hidden layer and the hidden layer to the output layer of the trained neural network are illustrated in Appendix C.
After comparisons of the predicted results in a different void fraction of the granular system, the results demonstrated that in a relatively dilute granular system, the ANN with OCRA outperforms the ANN in the situation with a wide range of granular sizes. The representative predicted results of the granular system, which are ε g = 0.7, are demonstrated in Figure 6. In a relatively dense granular system, ANN with OCRA and ANN have a similar prediction accuracy in general (see Table 8); both have their own advantages. Figure 7 shows the drag force predictions of the granular system when ε g = 0.35. After comparisons of the predicted results in a different void fraction of the granular system, the results demonstrated that in a relatively dilute granular system, the ANN with OCRA outperforms the ANN in the situation with a wide range of granular sizes. The representative predicted results of the granular system, which are g 0.7 ε = , are demonstrated in Figure 6. In a relatively dense granular system, ANN with OCRA and ANN have a similar prediction accuracy in general (see Table 8); both have their own advantages. Figure 7 shows the drag force predictions of the granular system when g 0.35 ε = . Figure 6. Dimensionless drag force prediction when ε g = 0.7.
As shown in Figure 7, when ANN and ANN with OCRA predict the drag force of medium-sized granules in ε g = 0.35, the overall prediction capability of the two drag models are approximately the same. Specifically, considering the granule size of d = 3d 0 , the prediction accuracy of ANN with OCRA is slightly better than ANN. For the granule size of d = 4d 0 , ANN with OCRA and ANN have a similar prediction accuracy in general. However, in smaller and larger granules (d = 2d 0 , d = 5d 0 ) ANN gives better results than the ANN with OCRA drag model. Generally, the prediction accuracy of ANN with OCRA is better than ANN in the polydisperse granular system. The results show that, although a certain portion of samples is excluded in the ANN with OCRA model, the prediction accuracy still increases. It can be inferred that OCRA plays an active role in preventing over-fitting problems and repeated ineffective training for the ANN drag model. Meanwhile, compared to the DNS results, ANN with OCRA has more prediction deviations than ANN for the granular size d = 2d 0 and d = 5d 0 generally (see Figure 7). The reason may be that, due to the filtering of OCRA, the information loss of larger and smaller granules in the training process will produce a certain prediction deviation. However, the loss of information for medium-sized granules in the ANN with OCRA training does not significantly affect the predicted results, which is probably because the large number of granules will lead to the information loss of medium-sized granules easily compensated for by other similar medium-sized granules. On the other hand, the smaller and larger granules have difficulty achieving this compensation because of their smaller amount, and this phenomenon will become worse in denser granular systems that are dominated by the heterogeneous configuration of a granular cluster. Nevertheless, considering that such a large part of the flow in nature is relatively dilute flow, ANN with OCRA can be a suitable tool to capture the characteristics of flow in nature.  As shown in Figure 7, when ANN and ANN with OCRA predict the drag force of medium-sized granules in g 0.35 ε = , the overall prediction capability of the two drag models are approximately the same. Specifically, considering the granule size of , the prediction accuracy of ANN with OCRA is slightly better than ANN. For the granule Next, it is needed to explore the efficiency improvement of the OCRA in establishing the ANN drag model. We use the training error of 1290 samples as the standard error E MSE,1290 (MSE of 1290 samples) to find the training set of the optimal contribution rate of the system with the given error tolerance, i.e.,(0, 1.8 * E MSE,1290 ), and the OCRA is used in searching the most valuable samples. The specific searching processes are recorded below in Table 9.  Table 9 shows the OCRA has found the optimal contribution rate of the system in the given error tolerance (0, 1.8 * E MSE,1290 ), which is found to be 0.9140625, and the optimal training set has 1052 samples through 7 steps to find these samples. The optimal contribution sample set, which only accounts for 82% of the original data volume, provides a 91.4% contribution, and the system's computing efficiency is obviously improved. It still can be concluded that using the OCRA to train the ANN drag model can effectively reduce the amount of training samples without significantly increasing the system errors, and the computational efficiency of the ANN drag model will be improved.

Drag Model Validation
To verify the validity of the ANN with OCRA drag model in the polydisperse system, we compare the prediction results of ANN that used OCRA with the well-known BVK formula [27], Cello et al. formula [21], and HS formula [30] in polydisperse and monodisperse systems. The empirical formula and prediction results are shown in Table 10 and Figures 8 and 9. Table 10. Empirical drag formulas for comparison.
As shown in Figure 8a,b, when we explore the drag force prediction of ε g = 0.35 in the polydisperse system, the ANN with OCRA model achieves the best predicted accuracy for medium-sized granules (d = 3d 0 ) compared to the Cello et al. and HS formulas. The BVK formula suffers the lowest accuracy of all four drag models. For the larger-sized granules (d = 4d 0 ), Cello et al.'s formula shows the best performance among the 4 drag models; especially, these advantages will become more obvious when Re p ∈ [300, 550]. On the other hand, the ANN with OCRA has the lowest predicted accuracy compared to the HS and BVK drag models. However, when the 4 drag models were considered in a polydisperse diluted granular system (ε g = 0.7), the accuracy of the ANN with OCRA drag model is better than the drag model by HS, Cello et al., and BVK in general. In addition, when we explore the drag prediction of these models in a monodisperse granular system (see Figure 9 and Table 11), the accuracy of ANN with OCRA is better than the BVK and HS drag models in terms of medium-sized granules (d = 3d 0 ) with void fractions ε g = 0.5 and ε g = 0.85; the Cello et al. drag model has the lowest predicted accuracy. Similar results are shown in the granules (d = 4d 0 ) of a monodisperse granular system; ANN with OCRA has the best prediction accuracy, followed by the HS drag model, and the BVK and Cello et al. drag models have the lowest prediction accuracy for granules (d = 4d 0 ) in ε g = 0.5 and ε g = 0.85, respectively.
As shown in Figure 8a,b, when we explore the drag force prediction of 0.35 g ε = in the polydisperse system, the ANN with OCRA model achieves the best predicted accuracy for medium-sized granules ( On the other hand, the ANN with OCRA has the lowest predicted accuracy compared to the HS and BVK drag models. However, when the 4 drag models were considered in a polydisperse diluted granular system ( 0.7 g ε = ), the accuracy of the ANN with OCRA drag model is better than the drag model by HS, Cello et al., and BVK in general. In addition, when we explore the drag prediction of these models in a monodisperse granular system (see Figure 9 and Table 11), the accuracy of ANN with OCRA is better than the BVK and HS drag models in terms of medium-sized granules ( Figure 9. Predictions of different drag models in monodisperse system [21,27,30]. Table 11. MRE of each drag model in Figure 9. The simulation results demonstrate that ANN with OCRA has both advantages and disadvantages in different physical conditions. For the advantages, to start with, the ANN with OCRA drag model has obvious advantages in predicting the drag force of medium-sized granules in the polydisperse granular system compared to the other three analytic drag formulas. Next, ANN with OCRA performs well in predicting the largersized granules in a relatively dilute polydisperse granular system. Finally, the performance of ANN with OCRA is generally satisfied with the predicting results in a monodisperse granular system. For the disadvantages, ANN with OCRA does not achieve the ideal predicted results in the relatively dense granular system for larger granules. Taking into account this, it is advised to adopt Cello et al.'s drag model for prediction. Therefore, it can be concluded that the ANN with OCRA drag model is good at the prediction of polydisperse granules with a relatively wide concentration and has better compatibility in a monodisperse granular system. There may be three reasons that we have inferred. First, due to the analytic formula, which the other three formulas hold, having a fixed structure, a fixed functional structure has difficulty accurately describing the characteristics of two different flow systems indeed. Second, the ANN has the advantages of self-learning ability and adequate compatibility. The drag force model based on ANN might be easier for switching accurately between monodisperse and polydisperse systems. Third, the OCRA can remove some redundant and singular samples in the training sample set, which can reduce the probability of over-fitting problems in the ANN sample fitting. Therefore, under the accepted tolerance of system errors, the ANN with OCRA drag model has a wide range of applicability in the polydisperse granular system and can also be compatible with monodisperse systems. All in all, considering the performance of each drag model, the ANN with OCRA is suggested to be applied in the prediction of a relatively dilute granular system and a dense granular system for medium-sized granules; Cello et al.'s drag model is suggested to be adopted in the relatively dense granular system for larger granules. The HS drag model is more applicable in a monodisperse granular system.

Conclusions
A reasonable ANN model has been successfully established for describing the drag force in polydisperse granular flows. To accomplish this task, the 3D LBM for a polydisperse granular system is used to obtain diverse training data, and then the structures of ANN, including the network layer, learning algorithm, neurons of hidden layers, activation function, and input variables, were comprehensively discussed regarding the computational efficiency and Spearman correlation coefficient between the outputs and expected ones. For the training of ANN, a new adaptive algorithm for a polydisperse drag model was proposed to improve the training efficiency and avoid the over-fitting problem, where redundant and singular samples can be automatically eliminated from the training set. By comparing and analyzing the simulation results, the ANN drag model has good prediction ability in the range of ε s ∈ [0.05, 0  In brief, ANN has obvious advantages in the construction of the drag model of the polydisperse granular system, considering the complex non-linear relationship between the variables. The verified structure of the ANN drag model can effectively realize the drag prediction both in monodisperse and polydisperse granular systems. Moreover, using the OCRA to train the ANN drag model can obviously improve the computational efficiency without significantly increasing the system errors. Nevertheless, the prediction accuracy of ANN with OCRA for the larger granules in a relatively dense system still needs to be improved; in light of these imperfections, a more novel structure and efficient training strategy for the ANN drag model can be taken into consideration in future works. Data Availability Statement: Data will be available upon a reasonable request.

Acknowledgments:
The authors would like to thank the anonymous referees, whose valuable comments helped to improve this paper. The authors extend their gratitude to Hussein A. H. Muhammed for his careful checking and improving of the language expression of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

The Impact of Input and Output Correlation
To construct a nonlinear mapping between the inputs and the expected outputs by using the ANN, it is necessary to discuss the correlation between different input and output variables. Taking the particle Reyolds number Re p given in Formula (8) as an example, Re p is a function of the solid volume fraction ε s and the slip velocity u slip . Table A1 lists the relationship between input and output variables of the ANN. Table A1. The relationship between input and output variables.

Combination Input Output
A ε s , u slip , < d > Re p B ε s , v s,loc , v g,loc , < d > Re p Table A2 shows the Spearman correlation coefficients between the outputs and the expected outputs (results of DNS) for four independent experiments. The results show that the combination A is much more efficient than the combination B.

Appendix B
The Searching Process of Optimal Number of Training Samples To find a reasonable training sample set, we start with 990 samples and increase samples with the same step size. The E MSE,avg , E MAE,avg , E MRE,avg , and CPU time of the trained ANN are given in Table A3 as the value of Re p ∈ [100, 200] and ε g ∈ [0.35, 0.5]. E MSE,avg , E MAE,avg , and E MRE,avg denote the average MSE, MAE, and MRE of all size granules, respectively. The CPU time is the summation of the training and prediction times consumed by the ANN when the error limit is set by ε = 10 −3 .  Figure A1. The relationship between training error and sample size [33].
As shown in Table A3, when the number of training samples increases from 990, the training errors MSE,avg E , MAE,avg E and MRE,avg E of the system decrease gradually. When the number of training samples approaches 1290, the reduction speed of the training error gradually slackens, but the consumed CPU time increases rapidly. Considering the computational efficiency of the system and similar results in other intervals, 1290 training samples are adopted as the training dataset in each interval.  Figure A1. The relationship between training error and sample size [33].

Appendix C
As shown in Table A3, when the number of training samples increases from 990, the training errors E MSE,avg , E MAE,avg and E MRE,avg of the system decrease gradually. When the number of training samples approaches 1290, the reduction speed of the training error gradually slackens, but the consumed CPU time increases rapidly. Considering the computational efficiency of the system and similar results in other intervals, 1290 training samples are adopted as the training dataset in each interval. Table A4. ANN simulation parameters [61].