2.1. Discrete Element Method (DEM)
The Discrete Element Method (DEM) was initially suggested by Cundall and Strack [
27] to model the mechanical behavior of granular flows and to simulate the forces acting on each particle and its motion. Typically, a particle can be classified into two types of motion in DEM: Translation and rotation. Momentum and energy of particles are exchanged during collisions with their neighbors or a boundary wall (contact forces), and particle-fluid interactions, as well as gravity. Through the application of Newton’s second law of motion, we can determine the trajectory of each i-particle (including its acceleration, velocity, and position) from the following equations:
where m
i = the mass of a particle i;
= the velocity of a particle;
= Gravity acceleration;
= interaction force between particle i and particle k (contact force);
= interaction force between the particle i and the fluid; I = moment of inertia;
= angular velocity; d
i = diameter of the grain i; and
= directional contact = vector connecting the center of grains i and k.
We use a contact force model based on the principle of spring-dashpot as well as suggestions of Hertz-Mindlin [
28]. The contact force is obtained from a force analysis method; the stiffness and damping factors are analyzed in two directions: Orthogonal and tangent of the contact surface between the two grains (
Figure 2):
(n) and (τ) are known as two components of contact force in normal and tangential directions; k
i = stiffness of grain i; δ
i,k = the characteristic of the contact and displacement (also called the length of the springs in the two directions above); α
i = damping coefficient; and Δu
i = relative velocity of grain at the moment of collision. Following Coulomb, the value of tangential friction is determined by the product of the friction coefficient μ and the orthogonal force component. In the nonlinear contact force, Hertz-Mindlin model, the tangential force component will increase until the ratio (f
(τ)/f
(n)) reaches a value of μ, and it retains the maximum value until the particles are no longer in contact with each other. A detail of the force models, as well as the method for determining the relevant coefficients, can be found in Reference [
28].
After calculating all forces acting on the sediment particles as well as the velocity and the position of the particle at a previous time step, we can determine the current velocity and position of grain by solving Equations (3) and (4). The grain size distribution, as well as the bed porosity for whole the domain, can be defined afterward. As a result, we can also estimate the exchange rate of the fine fraction between different bed layers.
The DEM simulations begin with defining the system geometry. This comprises boundary conditions, particle coordinates, and material properties by identifying the contact model parameters, such as the friction and stiffness coefficients. How loading or deforming occurs within the system can be determined by the user through adding loads, deformations, or settlements. The simulation begins as either a transient or dynamic analysis and runs until the completion of a defined number of time steps. An overlap check procedure starts after particles are inserted into the simulation box, which is conducted based on the geometry and coordinates of the particles. Upon the simulation of motions starting, particles that physically encounter each other are detected, and the contact forces are then calculated at each time step. The magnitude of particle forces is related to the distance between the each of the contacting particles. From this data, the resultant force including, body forces, external forces, and moment acting on each particle can be calculated.
Moreover, two sets of equations for the dynamic equilibrium of the particles are computed in the case when particle rotation is blocked. Each particle translational movement is derived from the resultant applied force and each particle rotational movement is formulated from the resultant applied moment. By knowing the inertia of the particles, particle translational, and rotational accelerations can be calculated. After new contract forces are determined, the particle positions and orientations are updated and ready for the next time step and will be repeated for all time steps. While this system seems to respond in an almost static manner, the Discrete Element Method is a transient or dynamic analysis.
Figure 3 shows the calculation series that occur within a given time step. Particle velocities and incremental displacements are the first to be calculated. Here, the equilibrium of each particle in the sequence is considered. In the second series of calculations, upon the system geometry being updated, the forces at each contact in the whole system are then calculated. The particle rotational moment is produced from the normal contact force, as well as the tangential component of the contact force. As the output of these calculated moments and forces, the new particle position is generated for the next time step, and the series of calculation begins again. For every particle-based DEM simulation, the following fundamental assumptions are accepted. The first consideration is that particles are rigid, each possessing a finite inertia that can be described analytically. Moreover, the particles can translate and rotate independently of each other. The detections of new particle contacts are automatically completed by a geometry check algorithm. Physical contacts of particles normally happen over an infinitesimally small area based on the allowed overlapping and consists of only two particles. Particles that interact in DEM simulations are authorized to overlap slightly at the contact point, where the magnitude of the overlap is required to be small. The compressive inter-particle forces can be calculated from the particles overlapping value. Tensile and compression forces can be transferred at particle contact points and is normal in the direction of contact, as well as a tangential force orthogonal to the normal contact force. Furthermore, there is distance between two separating particles where the tensile inter-particle forces are calculated. When particles collide, this force is its maximum value, and then the particles move away from each other, which also means that the contact area diminishing to zero and is no longer used in contact force calculations. The last key assumption is that clusters of the rigid base particles can be used to represent a single particle. A measurable deformation of the composite particles is caused by the relative motion of the base particles within the cluster. These particle agglomerates may also be rigid themselves.
In this study, we used the open Source Discrete Element Method Particle Simulation (LIGGGHTS) implemented a new Hertz-Mindlin granular contact model [
28,
30,
31], where grains are modeled as compressible spheres with a diameter d that interacts when in contact via the Hertz-Mindlin model [
28,
30,
31]. An algorithm was developed to calculate grain size distribution and porosity from the calculated results of location and diameter of grains.
Defining a simulation time step is one of many essential steps in setting up the DEM. Sufficiently short time steps ensure stability of the system and enable stimulation of the real processes. According to Johnson [
27,
28], disturbances that occur during motion of particles in a granular system propagate following the Rayleigh waves form along the surface of solid. The simulation time step is included in the Rayleigh time, which is the time the energy wave takes to transverse the smallest element in the system. The simulation time step should be small enough so that any disturbance of a particle’s motion only propagates to its nearest neighbors. Velocity and acceleration are assumed to be constant during the time step. Moreover, the time step duration should be smaller than the critical time increment evaluated from theory. Several equations have been proposed for calculating a critical time step [
27]. In this study, we applied a time step of 0.00001 s, which is smaller than 20 percent of the Rayleigh time.
2.2. Algorithms for Calculating Grain Size Distribution and Porosity of a Cross Section from DEM Results
The results obtained from LIGGGHTS, an open source Discrete Element Method particle simulation software, contains 3D locations and diameter of grains. To calculate the porosity and grain size distribution, a simple algorithm was developed. We used the K different planes with elevations z
k (k = 0, …, K), which intersect the spherical grain matrix. The diameter of generated circle i (i = 1, …, n
k) is dependent on the spherical diameter and the relative position between the k-plane and grain i (
Figure 4).
The diameter of each circle created by the intersection between plane k and grain ith is calculated as:
Total solid area (A
s,k) of all n
k grains in plane k is determined:
The total area A
t is calculated based on the shape generated by the plane k cut across the grain matrix, whereby porosity of cross section k is calculated by the following equation:
To calculate the grain size distribution, the grains in cross-section k are divided into m
k size fractions with characteristic grain size D
j (j = 1, …, m
k) and D
j < D
j+1, then the area of each fraction is calculated by
The fraction of class j in cross-section k is calculated by the following equation:
2.3. Feed Forward Neural Network (FNN)
Artificial Neural Network (ANN) is a general term encompassing many different network architectures. A Feedforward Neural Network (FNN) is an artificial neural network where connections between nodes do not form a cycle [
32]. FNN is the first and simplest type of artificial neural network developed. Information of an FNN travels in only one direction, forward, from input nodes, through hidden nodes, then to the output nodes. Further, the most widely used FNN is a multilayer perceptron (MLP). An MLP model contains several artificial neurons otherwise known as processing elements or nodes. A neuron is a mathematical expression that filters signals traveling through the net. An individual neuron receives its weighted inputs from the connected neurons of the previous layer, which are normally aggregated along with a bias unit. The bias unit is purposed to scale the input to a useful range to improve the convergence properties of the neural network. The combined summation is delivered through a transfer function to generate the neuron output. Weighted connections modify the output as it is passed to neurons in the next layer, where the process is repeated. The weight vectors that connect the different network nodes are discovered through the so-called error back-propagation method. During training, these parameters values are varied in order for the FNN output to align with the measured output of a known dataset [
33,
34]. Changing the connections’ weights in the network, according to an error minimization criterion, achieves a trained response. Overfitting is avoided if a validation process is implemented during the training. When the network has been sufficiently trained to simulate the best response to input data, the network configuration is fixed and a test process is conducted to evaluate the performance of the FNN as a predictive tool [
22].
In feed-forward networks (
Figure 5), messages are passed forward only. A network with L layers has a parameter and a differentiable function
corresponding to the lth layer. Given an input x ∈
, the network outputs:
where each
is defined and is defined recursively from the base case
as follows:
The training process minimizes a loss function
over labeled examples (x, y). The gradient of the squared loss on (x, y) with respect to W(L) is
The form mirrors the delta rule because
where
does not involve
. By defining the “error term”,
we can simplify Equation (14) as
. Similarly, the gradient with respect to
for l < L can be verified to be
where
Computing all gradients in a multi-layer network in this manner is commonly known as “backpropagation”, which is just a special case of automatic differentiation. For concreteness, here is the backpropagation algorithm for an L-layer feedforward network with the squared loss
Input labeled example parameters
Feedforward phase
Set
, and for l = 1, …, L compute:
Backpropagation phase
Set
, and for l = L−1, …, 1 compute:
Set , and for l = 1, …, L.
The optimization algorithm (or optimizer) is the main approach used for training a machine-learning model to minimize its error rate. There are two metrics to determine the efficacy of an optimizer: Speed of convergence (the process of reaching a global optimum for gradient descent); and generalization (the model’s performance on new data). Popular algorithms such as Adaptive Moment Estimation (Adam) or Stochastic Gradient Descent (SGD) can capably cover one or the other metric. The Adam optimizer, presented by Kingma and Ba [
35], is extensively used for deep learning models requiring first-order gradient-based descent with small memory and the ability to compute adaptive learning rates for different parameters [
36]. This method is computationally efficient, easy to implement, and has proven to perform better than the RMSprop and Rprop optimizers [
37]. Gradient rescaling is reliant on the magnitudes of parameter updates. The Adam optimizer does not require a stationary object and can work with more sparse gradients. We calculate the decaying averages of past and past squared gradients
and
, respectively, as follows:
and are estimates of the first moment (the mean) and the second moment (the uncentered variance) of the gradients, respectively. and are initialized as vectors of 0, the authors of Adam noticed that they are biased towards zero, particularly during the initial time steps and during smaller decay rates (i.e., and are close to 1).
Bias-corrected first and second moment estimates are computed to counteract these biases:
Parameters are then updated by:
The default value in this study:
and
with learning rate = 0.001. More detail about this method is available in Reference [
35]
To create the FNN architecture, one must first determine the number of layers of each type and the number of nodes in each of these layers. In an FNN, one or more hidden layer of sigmoid neurons are often found, subsequently followed by an output layer containing linear neurons or nodes. This is completed because by having multiple layers of neurons with nonlinear activation functions, it allows the network to learn nonlinear relationships that exist between input and output vectors [
38]. There is debate surrounding if the performance of FNN improves from the addition of more hidden layers. It has been found that the instances where performance improves with a second (or third, etc.) hidden layer are very few. Thus, one hidden layer is claimed as adequate for most problems FNN aims to solve. Let it be known that the number of neurons in the input layer is equal to the number of input features in the data set. The output layer contains only a single node namely the bed porosity. The optimal size of the hidden layer is normally in the range of the size of the input and output layers [
39]. In our study, an FNN, designed for the largest number of inputs 10, with three layers is created with number neural nodes in the input layer of 10, the hidden layer of 8, and the output layers of 1.